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To our families 


Preface 


The introduction of scientific evidence in legal proceedings raises a host of intricate 
questions and themes, ranging from the architecture of legal systems across contem- 
porary jurisdictions and psychological aspects of judgment and decision-making, to 
principles and methods of logical reasoning and decision-making under uncertainty. 
Over decades of theoretical and practice-oriented research, scholars in fields such 
as law, statistics, history, philosophy of science, psychology, and forensic science 
have come to the understanding that the sound use of scientific findings in evidence 
and proof processes critically depends on the ability of forensic scientists to use 
formal methods of reasoning, so as to ensure a coherent approach to dealing with 
and communicating about uncertainty. The focal point of these developments is the 
recognition of probability as the reference method for measuring uncertainty. 

It is thus hardly surprising that, in recent years, the intersection between law 
and forensic science has seen an increase in the number of reports, guidelines, and 
recommendations issued by eminent societies, review panels, and expert groups 
that insist on the importance of aligning the interpretation of scientific evidence by 
forensic scientists to a probabilistic measure of the value of evidence.! This measure 
is the likelihood ratio and has been widely described in peer-reviewed articles and 
textbooks. 

What is less often recognized, however, is that the likelihood ratio is merely a 
particular instance of a more general concept, known as the Bayes factor. While 
the likelihood ratio is typically presented in the focused context of evidence-based 
discrimination between pairs of competing propositions, the Bayes factor is a 
method of choice for approaching a more comprehensive collection of problems 
commonly associated with the use of measurements and data in forensic science. 


' Examples include documents issued by the Royal Statistical Society (Aitken et al., 2010), The 
Royal Society of Edinburgh (Nic Daéid et al., 2020), The UK Forensic Science Regulator (Tully, 
2021), The European Network of Forensic Science Institutes (Willis et al., 2015), The Association 
of Forensic Science Providers (Association of Forensic Science Providers, 2009), and expert 
communities, in particular sub-fields of forensic science, such as forensic genetics (e.g. Gill et al., 
2018) or forensic voice comparison (Drygajlo et al., 2015; Morrison et al., 2021). 
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Examples include the comparison of probabilistic models, model selection, and 
decision-making regarding competing theories and model parameters. We believe 
that by becoming acquainted with Bayes factors across a range of different 
applications, forensic scientists can strengthen the use of probabilistic methods in 
their respective disciplines. Forensic scientists should also gain an understanding of 
the role of Bayes factors in coherent decision-making under uncertainty. The core 
idea of this book on Bayes factors, the first on this theme in forensic science, is to 
address these questions. 

Bayes Factors for Forensic Decision Analyses with R is a new Bayesian modeling 
book that provides a self-contained account of essential elements of computational 
Bayesian statistics using R, a leading programming language and a freely available 
software environment for statistical computing. This book features a well-rounded 
approach to three naturally interrelated topics. The first is probabilistic inference. 
As a core concept of Bayesian inferential statistics, Bayes factors are ideally suited 
to help forensic scientists think about the logical and balanced evaluation of the 
value of evidence. This is a necessary preliminary to coherent reporting on scientific 
evidence. Second, this book highlights the logical connection between probabilistic 
reasoning, using Bayes factors, and decision analysis under uncertainty. This 
perspective involves the decision-theoretic (re-)conceptualization of questions that, 
in classical statistics, are often framed as problems of hypothesis testing using a 
disparate set of concepts, such as p-values, that have a longstanding and well- 
documented history of misinterpretations by both scientists and recipients of expert 
information. Here, Bayes factors provide a sound and defensible alternative. The 
third theme that this book covers is operational relevance. Thus, throughout this 
book, all key concepts are systematically illustrated with hands-on examples and 
complete template code in R, including sensitivity analyses and explanations on 
how to interpret results in context. This usefully complements the theoretical and 
philosophical justifications for the coherent approach to inference and decision 
emphasized throughout this book. 

Besides explaining the role of the Bayes factor as a guide to reasoning and as 
a preliminary to coherent decision analysis, the original contribution of this book 
is to work out the relevance of these topics with respect to two main forensic 
areas of application: investigation and evaluation. The first, investigation, refers 
to discriminating between general propositions of interest, i.e., when no named 
person (or object) is available for comparative examinations with a given trace, 
mark, or impression of unknown source. The second, evaluation, is concerned 
with assessing the meaning of evidence with respect to specific propositions of 
interest, e.g., whether given trace material, a mark, or an impression comes from a 
particular person (or object), rather than from an unknown person (or object). While 
investigation and evaluation pertain to distinct procedural phases with specific needs 
and constraints, they involve inferential and decisional tasks that have common 
conceptual underpinnings that can be formally captured, analyzed, and expressed in 
terms of Bayes factors, and embedded in a coherent framework for decision analysis. 

This book does not contain recipes nor does it intend to prescribe what scientists 
should do. Instead, the aim of this book is to provide forensic scientists with 
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a sound analytical framework for inference and decision analysis that allows 
them to critically rethink their current approaches drawn from more traditional 
courses in probability and statistics. As prerequisites, readers should have a 
minimal background in probability and statistics including, ideally, notions from 
Bayesian statistics. With its balanced presentation of theoretical and philosophical 
background, together with practical illustrations, this concise book seeks to make 
an original contribution to forensic science literature. It will be of equal interest to 
forensic practitioners and applied forensic statisticians, and can be used to support 
courses on Bayesian statistics for forensic scientists. Occasionally, we will refer to 
datasets and computational routines, available as online supplementary materials on 
the book’s website at http://link.springer.com/. 

This book presents materials developed through a longstanding collaboration 
between the authors. Their research was supported, at various instances, by the 
Swiss National Science Foundation, the Foundation for the University of Lausanne 
(Fondation pour l Université de Lausanne), the Vaud Academic Society (Société 
Académique Vaudoise), the Department of Economics of Ca’ Foscari University 
of Venice, and the School of Criminal Justice of the University of Lausanne. The 
authors are deeply indebted to Colin Aitken and Daniel Ramos for their valuable 
advice, to Lorenzo Gaborini for sharing routines developed in his Ph.D thesis, and 
to Luc Besson, Jacques Linden, Raymond Marquis, Valentin Scherz, and Matthieu 
Schmittbuhl for sharing data of forensic interest. Finally, students and fellow 
researchers at Ca’ Foscari University of Venice and the University of Lausanne have 
provided the authors with exciting and encouraging environments without which 
much of the writing of this book would not have been possible. 


Venice, Italy Silvia Bozza 
Lausanne-Dorigny, Switzerland Franco Taroni 
Lausanne-Dorigny, Switzerland Alex Biedermann 


August 2022 
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Chapter 1 A 
Introduction to the Bayes Factor and gegt 
Decision Analysis 


1.1 Introduction 


The assessment of the value of scientific evidence involves subtle forensic, sta- 
tistical, and computational aspects that can represent an obstacle in practical 
applications. The purpose of this book is to provide theory, examples, and elements 
of R code to illustrate a variety of topics pertaining to value of evidence assessments 
using Bayes factors in a decision-theoretic perspective. 

The structure of this book is as follows. This chapter starts by presenting an 
overview of the role of statistics in forensic science, with an emphasis on the 
Bayesian perspective and the role of the Bayes factor for logical inference and 
decision. Next, the chapter addresses three general topics that forensic scientists 
commonly encounter: model choice, evaluation, and investigation. For each of these 
themes, Bayes factors will be developed and discussed using practical examples. 
Particular attention will be devoted to the distinction between feature- and score- 
based Bayes factors, typically used in evaluative settings. This chapter also provides 
theoretical background analysts might need during data analysis, including elements 
of forensic interpretation, computational methods, decision theory, prior elicitation, 
and sensitivity analysis. 

Chapter 2 addresses the problem of discrimination between competing propo- 
sitions regarding target features of a population of interest (i.e., parameters). 
Examples include applications involving counting processes and propositions refer- 
ring to the proportion of items of forensic interest (e.g., items with illegal content) 
or an unknown quantity. Attention will be drawn to background elements that may 
affect counting processes or continuous measurements and a decisional approach to 
this problem. 

Chapter 3 addresses the problem of evaluation of scientific evidence in the form 
of discrete, continuous, and continuous multivariate data. The latter may present a 
complex dependence structure that will be handled by means of multilevel models. 
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Chapter 4 focuses on the problem of investigation, using examples involving 
either univariate or multivariate data. 

For each topic covered in the book, examples will be accompanied with R 
code, allowing readers to reproduce computations and adapt sample code to their 
own problems. The end of each chapter presents an outline of the principal R 
functions used throughout the respective chapters. While some functions can be 
easily reproduced, others are more elaborate and copying their R code would 
be tedious. These functions are available, as well as datasets, as supplementary 
materials on the book’s website (on http://link.springer.com/). 


1.2 Statistics in Forensic Science 


Forensic science uses scientific principles and technical methods to help with the 
use of evidence in legal proceedings of criminal, civil, or administrative nature. To 
assist members of the judiciary in their inquiries regarding the existence or past 
occurrence of events of legal interest, forensic scientists examine recovered traces, 
objects, and materials related to persons of interest. This may involve, for example, 
the analysis of the nature of body fluids and various other items such as textile 
fibers, glass and paint fragments, handwriting, digital device data, as well as the 
classification of such items and data into various categories. 

More generally, forensic science takes a major interest in both investigative pro- 
ceedings and evaluative processes at trial. This involves the examination of persons 
and objects, as well as the vestiges of actions. Forensic scientists also help with 
reconstructing past events. Thus, incomplete knowledge and, hence, uncertainty are 
key challenges that all participants in the legal process must deal with. The standard 
approach to cope with uncertainty is the structured collection and sound use of data. 
Typically, data result from the analysis and comparative examination of evidential 
material (i.e., biological traces, toxic substances, documents, crime scene findings, 
imaging data, etc.), followed by an assessment of the probative value of scientific 
results within the context of the event under investigation and in the light of the 
task-relevant information. 

However, despite its potential to support legal evidence and proof processes, 
forensic science has also been found to be a contributing factor to miscarriages of 
justice (Cole, 2014). Furthermore, over the last decade, reviews by expert panels 
have exposed several areas of forensic science practice as insufficiently reliable 
(e.g., PCAST, 2016), and courts across many jurisdictions have insisted on the need 
to probe and demonstrate the empirical foundations of forensic science disciplines. 

Scientists currently address these challenges by directing research not only 
toward more studies involving experiments under controlled conditions but also by 
developing formal frameworks for value of evidence assessment that can cope with 
scientific evidence independent of its nature and type. Central to this development 
is a convergence to the Bayesian perspective, which is well suited to help forensic 
scientists assess the probative value of observations that, typically, do not arise 
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under only one given hypothesis or proposition.! Bayesian thinking can cope with 
situations in which one holds varying degrees of belief about competing hypotheses 
and one considers that those hypotheses may differ in their capacity to account for 
one’s observations and findings. As noted by Cornfield (1967, p. 34), 


Bayes’ theorem is important because it provides an explication for this process of consistent 
choice between hypotheses on the basis of observations and for quantitative characterization 
of their respective uncertainties. 


In forensic science, the Bayes factor (BF)—a central element in Bayesian 
analysis—has come to play an extremely important role. It represents a key statistic 
for assessing the value of scientific findings and is, thereby, widely covered in 
forensic literature (e.g., Aitken et al., 2021; Buckleton et al., 2016). It allows 
scientists to assess case-related observations or measurements in the light of 
competing propositions presented by parties at trial. In essence, the Bayes factor 
is a concept that provides a measure of the degree to which a scientific finding is 
capable to discriminate between the competing propositions of interest. 

The choice of the Bayes factor to assess the value of outcomes of laboratory 
examinations and analyses results from the requirement to comply with several prac- 
tical precepts of coherent thinking and decision-making. The desirable properties 
that the Bayes factor accounts for are balance, transparency, robustness, and logic. 
In addition, it is a flexible measure, acknowledged throughout forensic science, law, 
and statistics, because it can deal with any type of evidence (e.g., Evett, 1996; 
Jackson, 2000; Robertson & Vignaux, 1993; Robertson et al., 2016; Good, 1950; 
Kass & Raftery, 1995; Lindley, 1977; Taroni et al., 2010). 

In forensic science, the Bayes factor is more commonly called likelihood ratio, 
even if this may create confusion because the two terms represent two distinct 
concepts, and the Bayes factor does not always simplify to a likelihood ratio. This 
will be explained later in Sect. 1.4. Generally, the use of the Bayes factor is now well 
established in both theory and practice, though some branches of forensic science 
are more advanced in Bayes factor analyses than others. A general overview is 
presented by the Royal Statistical Society’s Section Committee on Statistics and 
Law (e.g., Aitken et al., 2010) in a series of practitioner guides for judges, forensic 
scientists, and expert witnesses. 

While the Bayes factor represents a coherent metric for value of evidence 


' The term hypothesis (or proposition) is interpreted here as an assertion or a statement that such 
and such is the case (e.g., an outcome or a state of nature of the kind “the questioned document has 
been printed with printer 1” or “the recovered item is from the same source as the control item”) 
and also as a description of a decision. Propositions are, therefore, statements that are either true or 
false and that can be affirmed or denied. An important basis for much of the argument developed 
in this book is the assumption that personal degrees of belief can be assigned to propositions or 
hypotheses. Throughout this book, hypothesis and proposition are treated as synonyms. 
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assessment? in evaluative reporting? (i.e., when a person of interest is available 
for comparison purposes), it is important to mention that it can also be used in 
investigative contexts. A case is investigative when there is no person or object 
available for comparison, and examinations concentrate primarily on helping to 
draw inferences about general features (e.g., sex, right-/left-handedness, etc.) related 
to the source of a recovered stain, mark, or trace. More generally, the Bayes factor 
can be used for two main purposes in forensic science: 


e The first purpose is to assign a value to the result of a comparison between an item 
of unknown source and an item from a known source. This refers to the evaluative 
mode in which forensic scientists operate. Evaluating a scientific finding thus 
means that the scientist provides an expression of the value of the observation 
in support—which may be positive, negative, or neutral—of a proposition of 
interest in legal proceedings, compared to a relevant alternative proposition. 

e The second purpose is to provide information in investigative proceedings. Here, 
scientists operate in what is called investigative mode. They try to help answer 
questions such as “what happened?” and “what (material) is this?” (Jackson et al., 
2006). The scientist is said to be “event focused” and uses the findings to generate 
hypotheses and suggestions for explanations of observations, in order to give 
guidance to investigators or litigants. 


To illustrate these concepts, imagine a case involving a questioned document 
and handwriting. In cases of anonymous letter-writing, it regularly occurs that, at 
least initially, no suspected writer is available. In such a case, there will be no 
possibility for jointly evaluating characteristics observed on a questioned document 
and features on reference (known or control) material from a person of interest, 
as would be the case in an evaluative context. However, this does not mean that 
measurements made only on the questioned document, without comparison to 
reference material, could not be informative for investigative purposes. For example, 
features extracted from the handwriting of unknown source may be evaluated with 
respect to more general propositions such as “the questioned document (e.g., a 
ransom note) has been written by a man (woman)” or “the questioned document has 
been written by a right- (left)-handed person.” Helping to discriminate between such 
propositions contributes to reducing the pool of potential writers in an investigation. 

AS a metric to assess the value of findings in a forensic context, the Bayes factor 
allows practitioners to offer a quantitative expression that they can convey in a more 
general reasoning framework that conforms to the logic of Bayesian thinking. From 
the scientist’s point of view, the contribution to inference is perfectly symmetric. 
That is, the findings may support either of the two competing propositions, with 


? A list of necessary logical conditions to guarantee coherence is presented and discussed in Taroni 
et al. (2021a). 
3On the difference between evaluative and other types of reporting, such as technical and 


intelligence reporting, see ENFSI Guideline for Evaluative Reporting in Forensic Science (Willis 
et al., 2015) §1.1. 
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respect to the relevant alternative proposition. This strengthens the scientist’s role 
as balanced expert in the legal process. 


1.3 Bayesian Thinking and the Value of Evidence 


Bayesian philosophy is named after Reverend Thomas Bayes and is based on an 
interpretation of probability as personal degree of belief (de Finetti, 1989). In 
Bayesian theory, all uncertainties in a problem must necessarily be described by 
probabilities. Probability is intended as one’s conditional measure of uncertainty 
associated with the evidence, the available information, and all the underlying 
assumptions. In this book, we will use the term evidence in the general sense of a 
given piece of information or data. This includes, but is not restricted to, the idea of 
evidence used in legal proceedings. The term evidence is used here in a broad sense 
as synonym for other terms such as “finding” or “outcome.” According to Good 
(1988), evidence may be defined as data that makes one alter one’s beliefs about 
how the world is working. The word finding, in turn, is used in this book to designate 
the result of a forensic examination or analysis. Findings are measurements in a 
quantitative form, discrete or continuous. Examples for discrete quantitative results 
are counts of glass fragments or gunshot residues. Examples for continuous results 
are measurements of physical quantities such as length, weight, refractive index, 
and summaries of complex comparisons in the form of similarity scores. For a 
formal definition of the term findings, see also the ENFSI Guideline for Evaluative 
Reporting in Forensic Science (Willis et al., 2015). 

Starting from prior probabilities, representing subjective degrees of belief about 
propositions of interest, the Bayesian paradigm allows one to rationally revise such 
beliefs and compute posterior probabilities, draw inferences about propositions, and 
make decisions (Sprenger, 2016). For example, when new information becomes 
available, it may be necessary to assess how this information ought to affect 
propositions regarding the involvement of a person of interest in particular alleged 
activities. Likewise, physicians need to structure their thought processes when 
performing medical diagnosis. In general, the question is how to update one’s 
personal beliefs regarding uncertain events when one receives new information. 

Suppose that the events H4, ..., H, form a partition, and denote by Pr(H; | D) 
the probability that is associated with H;, i = 1,...,n, given relevant background 
information J. This probability is called a prior probability. Furthermore, consider 
an event or quantity E, whose probability can be expressed by means of the law of 
total probability as 


Pr(E | 1) = ÈC Pr(E | Hj, 1) Pr(Hj | 1). (1.1) 
j 


The ENFSI Guideline for Evaluative Reporting in Forensic Science (Willis et al., 
2015, at p. 21) regards conditioning information as the essential ingredient of prob- 
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ability assignment, since all probabilities are conditional. In forensic evaluation, it 
is important not to focus on all possible information, but only on the information 
that is relevant to the forensic task at hand. Disciplined forensic reporting requires 
scientists to make clear their perception of the conditioning information at the time 
they conduct their evaluation. Conditioning information is sometimes known as 
the framework of circumstances (or background information). Much of the non- 
scientific information will not have a bearing on the value of scientific findings, but 
it is essential to recognize those aspects that do. Examples of relevant information 
may include the ethnic origin of the perpetrator (but not that of the suspect) and the 
nature of garments and surfaces involved in alleged transfer events. More generally, 
conditioning information may also include data and domain knowledge that the 
expert uses to assign probabilities. The conditioning on (task-) relevant information 
I is important because it clarifies that probability assignments are personal and 
depend on the knowledge of the person conducting the evaluation. 

Bayes rule (or theorem) is a straightforward application of the conditionalization 
principle and the partition formula (1.1). It allows one to compute the so-called 
posterior probability Pr(H; | E, I) as 


Pr(E | Hi, I) Pr(H;: | D __ Pr(E | Hi, D Pr(H; | D) 


POED BEID ` L PrE |H; DPH; 1D) 


which emphasizes that certain knowledge of E modifies the probability of H;.* Note 
that prior and posterior probabilities are only relative to the new finding E. The 
posterior probability will become again a prior probability when additional findings 
become available. Lindley (2000, p. 301) expressed this as follows: “Today’s 
posterior is tomorrow’s prior.’ Bayesian statistics is the sequential application 
of Bayes rule to all situations that involve observed and missing data, unknown 
quantities (e.g., events, propositions, population parameters), or unobserved data 
(e.g., future observations). 

Participants in the legal process are typically concerned with the problem of 
comparing competing propositions about a contested event. A typical example for 
trace evidence is “the recovered glass fragments come from the broken window” 
versus “the recovered glass fragments come from an unknown source.” When 
measurements on various items (i.e., glass fragments) are available, it may be 
necessary to quantitatively evaluate these findings with respect to selected proposi- 
tions of interest. According to Bayesian methodology developed by Jeffreys (1961), 
this involves the introduction of a statistical model to describe the probability 
of the available measurements according to different hypotheses (propositions or 
models). The posterior probability of each hypothesis is then computed via a 
direct application of Bayes theorem. Following Jeffreys’ criterion for comparing 
hypotheses, a hypothesis is accepted or rejected on the basis of its posterior 


4 See Taroni et al. (2020) for a discussion on the generalization of Bayes rule (i.e. Jeffrey’s 
conditionalization) when one is faced to uncertain evidence. 
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probability being greater or smaller than that of the alternative proposition. Note that 
the acceptance or rejection of a proposition is not meant as an assertion of its truth 
or falsity, only that its probability is greater or smaller than that of the respective 
alternative proposition (Press, 2003). 

The primary element in Bayesian methodology for comparing propositions is the 
Bayes factor (BF for short). It provides a numerical representation of the impact 
of findings on propositions of interest. In other words, the Bayes factor quantifies 
the degree to which observed measurements discriminate between competing 
propositions. The Bayes factor is the ingredient by which the prior odds in favor 
of a proposition are multiplied in virtue of the knowledge of the findings (Good, 
1958): 


Posterior odds = BF x Prior odds. 


Broadly speaking, prior and posterior odds are the ratios of probabilities of the 
hypotheses of interest before and after acquiring new findings, respectively. The 
value of experimental outcomes is measured by how much more probable they make 
one hypothesis relative to the respective alternative hypothesis, compared to the 
situation before considering the experimental findings. 

A formal definition of the Bayes factor is given in Sect. 1.4, along with a 
discussion about its interpretation as measure of the value of the evidence. Practical 
examples in Sects. 1.5 and 1.6 and further developments in Chaps.3 and 4 will 
illustrate the use of the Bayes factor for evaluative and investigative purposes. 


1.4 Bayes Factor for Model Choice 


Consider an unknown quantity X, referring to a quantity or measurement of interest 
such as the number of ecstasy pills in a sample drawn from a large seizure of 
pills, the elemental chemical composition of glass fragments, or a feature (e.g., the 
length) of a handwritten character. Furthermore, suppose that f(x | @) is a suitable 
probability model for X , where the unknown parameter® @ belongs to the parameter 
space ©. Suppose also that the parameter space consists of two non-overlapping sets 
©, and @2 such that © = ©; U @2. A question that may be of interest is whether 
the parameter 0 belongs to ©), or to @2, that is to compare the hypothesis 


A, :0 € 0, 


against the alternative hypothesis 


5 A probability model is understood here as a characterization of the distribution of measurements. 


6 A parameter is taken here as a characteristic of the distribution of all members (e.g., individuals 
or objects) of a population of interest. 
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Note that Hı is usually called the null hypothesis. Under a classical (frequentist) 
approach, the distinction between null and alternative hypotheses is very important. 
Users must be aware that when performing significance testing, competing hypothe- 
ses are not equivalent and there is, in fact, an asymmetry associated with them. One 
collects data (or evidence) against the null hypothesis before it is rejected, but the 
acceptance of the null hypothesis is not an assertion about its truthfulness. It merely 
means that there is little evidence against it. As will be shown, under the Bayesian 
paradigm, this does not represent an issue. 

A hypothesis H; is called simple if there is only one possible value for 0, say 
©; = {0;}. A hypothesis is called composite (see, e.g., Example 1.1) if there is more 
than one possible value. 

Let xı = Pr(H1) = Pr(@ € O1) and m2 = Pr(A2) = Pr(@ € @2) denote the prior 
probabilities for the competing composite hypotheses Hı and H2. Note that, for the 
sake of simplicity, the letter Z denoting background information is omitted here. The 
ratio of the prior probabilities 7 /zr2 is called the prior odds of Hı to H2. The prior 
odds indicate whether hypothesis Hı is more or less probable than hypothesis H2 
(prior odds being greater or smaller than 1) or whether the hypotheses are (almost) 
equally probable, i.e., the prior odds are (close) to 1.’ Suppose observational data 
x are available that do not provide conclusive evidence® about the propositions of 
interest but will allow one to update prior beliefs using Bayes theorem. Let us denote 
by fx, (x) the marginal probability of the data under proposition H;, that is, 


fa x) = [ f(x | 0), (0)d0, (1.2) 


where 7y, (0) denotes the prior probability density of 6 for 0 € ©;. The marginal 
probability is also called the predictive probability, which is the probability to 
observe the actual data before any data become available. Kass and Raftery (1995) 
refer to it as the marginal likelihood: the probability of the observations averaged 


7 The ratio of the probabilities of two mutually exclusive and collectively exhaustive events is 
called odds in favor of the event whose probability is in the numerator of the ratio. Note that 
hypotheses are not necessarily exhaustive: the word odds is sometimes used loosely in reference 
to the ratio of the probabilities of mutually exclusive propositions whose probabilities do not add 
to 1 (Taroni et al., 2010). 


8 The problem of imperfect evidence is well illustrated by Robertson and Vignaux (1995, at p.12): 


An ideal piece of evidence would be something that always occurs when what we are trying 
to prove is true and never occurs otherwise. If we are trying to demonstrate the truth of an 
hypothesis or assertion we would like to find as evidence something which always occurs 
when the hypothesis is true and never occurs when the hypothesis is not true. In real life, 
evidence this good is almost impossible to find. 
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across the prior distribution over the parameter space ©. Note that the parameter 
space © can be either continuous or discrete. In the latter case, the integral in (1.2) 
must be replaced by a sum, and the marginal probability of the evidence (i.e., data 
x) becomes 


fana) = > fe | 6) PrO | Hi). 


0€0; 


The Bayes factor for comparing Hı and H3 is defined as the ratio of the marginal 
probabilities fy; (x) under the competing hypotheses, that is, 


pa fh) 


= : 1.3 
f(x) i ) 


Let a; = Pr(A | x) = Pr(0 € ©; | x) and ag = Pr(M2 | x) = Pr(0 € @ | x) 
denote the posterior probabilities for the competing hypotheses. The ratio of the 
posterior probabilities a /œ2 is called the posterior odds of Hı to H2. Recalling 
the odds form of Bayes theorem, one can express the Bayes factor for comparing 
hypothesis Hj against hypothesis H> as the factor by which the prior odds of H; to 
H are multiplied in virtue of the knowledge of the data to obtain the posterior odds, 
that is, 


qı /o2 = BF x 1/12. 


The Bayes factor measures the change produced by the new information (or, data) 
in the odds when going from the prior to the posterior distributions in favor of one 
proposition as opposed to a given alternative. For this reason, it is not uncommon to 
find the BF defined as the ratio of the posterior odds in favor of H to the prior odds 
in favor of H4, that is, 


pF = 2/02 (1.4) 


1/12 


One of the attractive features of using a Bayes factor to quantify the value of the 
acquired information is that it does not depend on prior probabilities of competing 
hypotheses. However, this bears potential for misunderstandings. The Bayes factor 
is sometimes interpreted as, for example, the odds provided by the data alone, for 
Hı to Hp: this is conceptually incorrect. Though cases may be found where the 
Bayes factor can be expressed as a ratio of likelihoods? and correctly be interpreted 


° While probabilistic modeling provides the probability f (x | @) of any hypothetical data x before 
any observation is made, conditional on 0, statistical methods allow one to draw conclusions 
about @ given the collected observations x. This difference in focus is expressed by the likelihood 
function, written 1(6 | x), where the probability distribution f(x | @) is written as a function of 0 
conditional on the observations x, i.e., f(x | 9) =1(0 | x). 
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as the “summary of the evidence provided by the data in favor of one scientific 
theory (...) as opposed to another” (Kass & Raftery, 1995, at p. 777), this does not 
hold in general. The Bayes factor will generally depend on prior assumptions. It is 
necessary, thus, to clarify the meaning of “prior assumptions” because confusion 
may arise between, on the one hand, the notion of prior probability about model 
parameters (0 € @;) and, on the other hand, prior probabilities of propositions (H;). 
To clarify this distinction, consider the comparison of a simple hypothesis H; : 
0 = 6, against a simple alternative hypothesis H2 : 6 = 6. The prior probabilities 
of these hypotheses are expressed as mı = Pr(@ = 01) and m2 = Pr(@ = 62). The 
posterior probabilities œ; in the light of prior probabilities 7; (i = 1, 2) and observed 
data x can be easily computed by means of a direct application of Bayes theorem: 


f(x | 0i 


op = Re ee PS T 
j=1,2 aed 


(1.5) 


The ratio of the posterior probabilities a; /a 2 obtained from computing (1.5) for 
i = 1, 2 simplifies to the product of the likelihood ratio times the ratio of the prior 
probabilities, that is, 


ay fala) m 


a f(x|) m 


Recalling (1.4), it is readily seen that the Bayes factor in this simple case is the 
likelihood ratio of Hı to M2, 


-IOl m m _ fale 
fald m m falo’ 


(1.6) 


and it is correct then to interpret this as “the odds provided by the data alone for Hı 
to M.” 

However, the comparison of simple versus simple hypotheses is a particular case 
among many others. Practitioners may face the more general situation where at least 
one of the hypotheses is composite, that is, the parameter of interest may take one of 
a range of different values (e.g., ©; = {01, . . . , Ok }), or infinitely many, as is the case 
when @ is continuous. In the case of composite hypotheses, the prior probabilities 
x; fori = 1, 2 will take the following form: 


eco; Pr(@) for 6 discrete 
mj = Pr(O € Oj) = (1.7) 
Jo, =(0)d@ for @ continuous, 


where z (0) is the prior probability density for 0 € ©. The posterior probabilities œ; 
are therefore computed as 
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Loco; S10) Pr) 
Xoco FO) Pra) 
a; = Pr(8 € ©; | x) = (1.8) 
Jo, F&O 0)d0 


for 0 discrete 


for 0 continuous, 


and the posterior odds will be 


Zoco; £19) Pr@) 


a (1.9) 
a Jo, f(x|0)x(0)d8 


for 0 discrete 


for 0 continuous. 


Following (1.4), the Bayes factor can be reconstructed as follows: 


Lace, F(x) PrO) xe, 
Doce, FOI) Pre) / =, for 0 discrete 


a (1.10) 
Jo, FOIDTOAdO 
Je, f (x|0)2(0)d0! m2 


for 0 continuous, 


where the 7; are computed as in (1.7). It is seen that the Bayes factor can no longer 
be expressed as a likelihood ratio as in the case of comparing simple versus simple 
hypotheses. We will show this for the case where @ is continuous. 

Start with the prior probability density 2 (0) on ©, and divide it by the probability 
xi of the hypothesis H; to obtain the restriction of the prior probability density 7 (0) 
on @,, that is, 


(8) 


i 


1H; (9) = 


for 0 € Oj. 


The probability density 74, (0) simply describes how the prior probability spreads 
over the hypothesis Hj. The prior probability density x (0) can thus be rewritten in 
the following form: 


mH, (0) for 0 € ©}, 
z(0) = 
m27, (0) for 6 € Op. 


Therefore, the posterior odds in (1.9) for the continuous case can be rewritten as 


ay m Jo, f(x | 0), (0)d0 
= (1.11) 
a2 m to, f(x | 0), (0)dé 


Recalling (1.4), the Bayes factor in (1.10) will take the form of integrated likelihoods 
under the hypotheses of interest, that is, 
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_ So, FO | rH, @)d0 
7 to, f(x lOr (0)d0- 


BF (1.12) 


The reader can verify that the two expressions in (1.3) and (1.12) are equivalent. 
Prior evaluations enter the Bayes factor through the weights my, (@) and zy, (@). 
The Bayes factor depends on how the prior mass is spread over the two hypotheses 
(Berger, 1985). It is also worth noting that whenever hypotheses are unidirectional 
(e.g., when comparing Hı : 6 < 6o against H2 : © > 6o), the choice of a prior 
probability density 7(@) over © = ©, U @2 (with ©; = [0, A] and ©; = (4p, 1]) 
is equivalent to the expression of a prior probability for the competing hypotheses. 
Conversely, whenever hypotheses are bidirectional (e.g., when comparing Hy : 0 = 
Oo against Hz : © Æ 6), one cannot choose a prior probability density 7 (0) over the 
entire parameter space ©, as this would amount to place a probability equal to 0 to 
the hypothesis Hı : © = 6o. The prior probability distribution over 0 must, in this 
case, be a mixture of a discrete component that assigns a positive mass 7, = Pr(@ = 
0o) to H; and a continuous component that spreads the remaining mass 72 = 1 —7 
over © according to the probability density 7x p, (0). The posterior probability a 
can then be computed as in (1.8), where ©; = 6, 


xı f (x | 8) 


ay = Pr(Hy | x) = m1 f (x | 09) + 12 to, fŒ | 0), (0)d0- 


(1.13) 


Analogously, the posterior probability œ may be computed, and the Bayes factor is 


F(x | 9) 


BF = 
Jo, FŒ | @)H,(0)d0 


(1.14) 


It can be observed that the Bayes factor in (1.14) does not depend on the 
prior probabilities of competing hypotheses which can vary considerably among 
recipients of expert information. Any such recipient can, starting from their own 
probabilities, use the Bayes factor to obtain posterior probabilities in a straight- 
forward manner. Consider, for the sake of illustration, the posterior probability of 
hypothesis H; in (1.13). A simple manipulation allows one to obtain 


afaa T! BF 
a ~ BF + m/m 


In summary, the Bayes factor thus measures the change in the odds in favor of one 
hypothesis, as compared to a given alternative hypothesis, when going from the prior 
to the posterior distribution. This means that a Bayes factor larger than 1 indicates 
that the data support hypothesis Hı compared to H2. However, the Bayes factor 
does not indicate whether Hı is more probable than the opposing hypothesis AH, 
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the BF only makes it more probable than it was before observing the data (Lavine 
& Schervish, 1999). 


Example 1.1 (Alcohol Concentration in Blood) A person is stopped because 
of suspicion of driving under the influence of alcohol. Blood taken from that 
person is submitted to a forensic laboratory to investigate whether the quantity 
of alcohol in blood @ is greater than a legal threshold of, say, 0.5 g/kg. Thus, 
the hypotheses of interest can be defined as Hj : 0 > 0.5 versus Hz : 0 < 0.5. 
Suppose that a prior probability density 7(@) is given for @ and that the 
prior probabilities of Hı and Hp in (1.7) are mı = 0.05 and m2 = 0.95, 
corresponding to prior odds approximately equal to 0.0526. These values 
suggest that, based on the circumstances, and before considering results of 
blood analyses, the hypothesis Hı is believed to be much less probable 
than the alternative hypothesis. Suppose next that the posterior probabilities, 
after taking into account laboratory measurements, are computed as in (1.8). 
The results are aj = 0.24 and a2 = 0.76. Thus, the posterior odds are 
approximately equal to 0.3158. The ratio of the posterior odds by the prior 
odds leads to a BF equal to 6. This result represents limited evidence in 
support of the hypothesis that the alcohol level in blood is greater than the 
legal threshold, compared to the alternative hypothesis. Still, the posterior 
probability of hypothesis Hı is low: the BF only renders the hypothesis Hı 
slightly more probable than it was before observing the measurements made 
in the laboratory. This example will be further developed in Chap. 2. 


1.5 Bayes Factor in the Evaluative Setting 


Consider the general situation where evidentiary material is collected and control 
items from a person or object of interest are available for comparative purposes. The 
following measurements of a particular characteristic are available: measurements 
y on a questioned item (e.g., a glass fragment found on the clothing of a person 
of interest) and measurements x on a control item (e.g., fragments from a broken 
window). In this evaluative setting, so-called source level propositions! could be 
defined as follows: 


10 The notion of source level refers to a given level in a hierarchy of hypotheses. This view 
considers a classification (i.e., hierarchy) of propositions into three main categories or levels, called 
the source level, activity level, and crime level. See Cook et al. (1998) for a discussion. Note that 
source level propositions for the example of glass fragments are chosen here as a formative example 
and for illustrative purposes. As a type of transfer evidence, glass fragments should be evaluated 
using activity level propositions (Willis et al., 2015). 
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Hı: The recovered (i.e., questioned) item comes from the same source as the 
control item. 

H: The recovered (i.e., questioned) item comes from an unknown source (i.e., 
different from the control item). 


This setting is called evaluative because it involves the comparison between 
control and recovered items and the use of the results of this comparison for 
discriminating between the competing propositions. Models for comparison can 
either be feature-based or score-based. Feature-based models (Sect. 1.5.1) focus on 
the probability of measurements made directly on evidentiary and reference items. 
Conversely, score-based models (Sect. 1.5.2) focus on the probability of observing 
a pairwise similarity (or distance), i.e., score, between compared materials. 


1.5.1 Feature-Based Models 


If one assumes that y and x are realizations of random variables Y and X with a 
given probability distribution f(-), the Bayes factor is 


5. Om BD 


= 1.15 
70.41 Ea. D cP) 


where J represents the available background information. Application of the rules 
of conditional probability allows one to rewrite the Bayes factor as follows: 


p= Ole), FG Ly!) 
fOlx, m, D fæl, D 


This expression can be further simplified by considering the fact that (i) the 
distribution of measurements x on the control item does not depend on whether Hy 
or H3 is true (and hence f(x | Hı, I) = f(x | H2, I) holds) and (ii) the distribution 
of the measurement y on the questioned item does not depend on the measurement 
x on the control item if H> is true,!! so that SO | x, Ho, D = f(y | M, I). The 
Bayes factor can therefore be written as 


_ fOlx, M1, 2) 


BF = 1.16 
Tolisi on 


11 Note that this assumption of independence is not always valid, e.g., with DNA evidence (Balding 
& Nichols, 1994; Aitken et al., 2021). A further example is the case of questioned signatures. Under 
the proposition that a signature has been forged and therefore is not authentic, one should take into 
account that a forger will attempt to reproduce the features of a target signature. Thus, recovered 
and control measurements cannot be considered independent (Linden et al., 2021); see Sect. 3.4.3. 
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The numerator is the probability of observing the measurements on the recovered 
item under the assumption that it comes from the known source, given the informa- 
tion J and knowledge of x, the features of the known source. The denominator is 
the probability of observing the measurements y on the recovered item, assuming 
that it comes from an unknown source, usually selected in some aleatory way from 
a relevant population,!* and assuming again the relevant information 7. Note that, 
for the sake of simplicity, the conditioning information 7 will be omitted in the 
arguments hereafter. 

For many types of forensic evidence, it can be reasonable to assume a parametric 
model { f(- | 6),@ € ©}. In this way, the probability distribution characterizing 
the available data is of a known form, with the only unknown element being 
the parameter 6, which may vary between sources. Consider, for example, the 
probability distribution f(- | 6) with unknown parameter 6 = 6), for the 
measurements y on the recovered item and the same probability distribution with 
unknown parameter 0 = 6, for the measurements x on the control item. In 
practice, the parameter 0 is unknown, and a prior probability distribution x (6 | Hj), 
representing personal beliefs about 6 under each hypothesis Hj, is introduced. The 
marginal distribution f(y | x, Hı) in the numerator of (1.16) may be rewritten as 
follows: 


fO |x, Mi) = fro | 0) (0 | x, H1)dé 
= f f(y | OFM | @)x@ | Ay)de/f(x| H), (1.17) 


where the posterior density 7(@ | x, H1) in the first line is rewritten in extended 
form using Bayes theorem. The distribution f(y | x, Hı) is also called a posterior 
predictive distribution.!* 

The marginal distribution f(y | H2) in the denominator of (1.16) can be rewritten 
as follows: 


fO | Aa) = f fO | @)x(@ | H2)dð. (1.18) 


This is also called a predictive distribution. 


12 Note that rules of conditional probability do not specify on which variable we should condition. 
Champod et al. (2004) suggest that we should condition on the item with greater information 
content. Therefore we usually condition on the control item (e.g., in the case of DNA, traces can 
be degraded or of small quantity, while a complete profile can usually be obtained for a person of 
interest). For further comments, see also Aitken et al. (2021, pp. 619—627). 

'3 For a discussion of posterior predictive distributions in forensic science contexts, see, e.g., 
Biedermann et al. (2015). 
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Example 1.2 (Toner on Printed Documents) Suppose experimental findings 
are available in the form of measurements of magnetism of toner on printed 
documents of known origin (x) and questioned origin (y) for which a Normal 
distribution is considered suitable. Thus, X ~ N(6x, o°) and Y ~ N(@y, o°), 
where the variance o° of both distributions is assumed known and equal 
(Biedermann et al., 2016a). A Normal distribution with mean u and variance 
t? is taken to model our prior uncertainty about the means 6, and Oy, that 
is, 0 ~ N(w, t°) for 0 = {0x, 0y}. The integrals in (1.17) and (1.18) have 
an analytical solution, and the marginals can be obtained in closed form. See 
Aitken et al. (2021, pp. 815-817) for more details. 

Here, Hı and AH denote the propositions according to which the items 
of toner come from, respectively, the same and different printing machines. 
Consider, first, the numerator of the BF in (1.17), where the posterior density 
m(@ | x, Hı) is still a Normal distribution with mean jz, and variance Te, 
computed according to well-known updating rules (see, e.g., Lee, 2012), 

o? Te 


E a mea” (1.19) 


Mx 


and 


2 o?r? 


T = PATE (1.20) 
The posterior mean, ux, is a weighted average of the prior mean yz and the 
observation x. The weights are given by the population variance ø? and the 
variance t? of the prior probability distribution, respectively, such that the 
component (observation or prior mean) which has the smaller variance has 
the greater contribution to the posterior mean. This result can be generalized 
to consider the distribution of the mean of a set of n observations x1, ..., Xn 
from the same Normal distribution (see Sect. 2.3.1). 

The marginal or posterior predictive distribution f(y | x, Hı) is also a 
Normal distribution with mean equal to the posterior mean jz, and variance 
equal to the sum of the posterior variance T and the population variance o°, 
that is, 


Œ |x, H1) ~ N(ux, T + 0°). (1.21) 


The same arguments apply to the marginal or predictive distribution f(y | 
H2) in the denominator, which is a Normal distribution with mean equal to 
the prior mean jz and variance equal to the sum of the prior variance t? and 
the population variance o”, that is, 


(continued) 
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Example 1.2 (continued) 
(Y | Hz) ~ N(u, t? + 0°). (1.22) 


The Bayes factor can then obtained as follows: 
pp — NOlee +o") 
NỌ | u, T? +07) 
(2 + o?) exp [3 07] 


x z EDAN 
(2 + 0?)=1/2 exp |- 9| 


EE 


Note that this can be easily extended to cases with multiple measurements 
yO oo cu Wa) See Sect 3 3ni); 


Note that the value of the measurements y and x may be expressed as a ratio of 
the marginal likelihoods in (1.17) and (1.18), that is, 


pr — SFOIDSELOO| Hd 1 
f(x | Hi) fO | M) 
o SLY LOSE OTO | Hido (1.23) 
S f£@|Ox@ | Hd f f(y | 0) @ | H2)d0’ l 
as f(x | Hı) = f(x | H2). If the recovered item and the control item come 


from the same source (i.e., hypothesis Hı holds), then 0y = 6,, otherwise 6, Æ 0, 
(i.e., hypothesis H2 holds). If Hz is true and hence the examined items come from 
different sources, the measurements can be considered independent. Note, however, 
that this is not necessarily the case. There are instances where the assumption of 
independence among measurements on control and recovered material under H2 
does not hold, and the BF will not simplify as in (1.23). See Linden et al. (2021) for 
a discussion about this issue in the context of questioned signatures. 

The expression of the Bayes factor in (1.23) involves prior assessments about 
the unknown parameter 0, in terms of x (6 | Hj), as well as the likelihood function 
FC | 0). Thus, the Bayes factor cannot generally be regarded as a measure of the 
relative support to competing propositions provided by the data alone. 


1.5.2 Score-Based Models 


For some types of forensic evidence, the specification of a probability model for 
available data may be difficult. This is the case, for example, when the mea- 
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surements are obtained using high-dimensional quantification techniques, e.g., for 
fingermarks or toolmarks (using complex sets of variables), in speaker recognition, 
or for traces such as glass, drugs or toxic substances that may be described by 
several chemical components. In such applications, a feature-based Bayes factor 
(Sect. 1.5.1) may not be feasible, and a score-based approach may represent a 
practicable (or even the only) available alternative. Broadly speaking, a score is 
a metric that summarizes the result of a forensic comparison of two items or traces, 
in terms of a single variable, representing a measure of similarity or difference (e.g., 
distance). Various distance measures can be used, such as Euclidean or Manhattan 
distance, see, e.g., Bolck et al. (2015).'* One of the first proposals of score-based 
approaches in forensic science was presented in the context of forensic speaker 
recognition by Meuwly (2001). 

Let A(-) denote the function which assesses the degree of similarity between 
feature vectors x and y. The similarity score A(x, y) represents the evidence for 
which a Bayes factor is to be computed. The introduction of a score function 
for quantifying the similarities/dissimilarities between compared items allows one 
to reduce the dimensionality of the problem, while retaining the discriminative 
information as much as possible. For a score given by a distance, for example, one 
will expect a value close to zero if the features x and y relate to items from the same 
source. Vice versa, if the features x and y relate to items from different sources, one 
will expect a larger score, provided that there are differences between members in a 
population. The score-based Bayes factor (SBF) is 


_ (AG) | AD) 
gG, y) | He, 1)’ 


sBF (1.24) 


where g(-) denotes the probability distribution associated with A(X, Y). For the 
sake of simplicity, the conditioning information J will be omitted hereafter. 

For the Bayes factor in (1.24), one cannot reproduce the simplified expression 
that was derived in (1.16) for the feature-based Bayes factor. The score-based 
Bayes factor must be computed as the ratio of two probability density functions 
evaluated at the evidence score A(x, y), given the competing propositions Hı and 
Hh. Since these two distributions are not generally available by default, the forensic 
examiner will generally try to derive a sBF using sampling distributions based on 
many scores produced under each of the two competing propositions. One way to 
compute the density of the score A(x, y) in the numerator is to generate many scores 
for comparisons between the known features x and the features y of other items 
known to come from the potential source assumed under Hı. The numerator can 
therefore be written as g(A(x, y) | x, Hı), where g(-) indicates that the distribution 
is constructed on the basis of relevant data (scores) produced for the case of interest. 


14 The score can also be interpreted as the inner product of two vectors (Neumann & Ausdemore, 
2020). 
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In the denominator, it is assumed that the proposition H2 is true, and x and 
y denote features of items that come from different sources. The challenge for 
the forensic examiner is that of selecting the most appropriate data for obtaining 
the distribution in the denominator. Note that there are different ways to address 
this question because, depending on the case at hand, it might be appropriate 
to condition on (i) the known source (i.e., pursuing a so-called source-anchored 
approach), (ii) the trace (i.e., trace-anchored approach), or (iii) none of these (i.e., 
non-anchored approach). This amounts to evaluating the score using the probability 
density distribution that is obtained by producing scores for comparisons between 
(i) the features x of the control item from the known source and features of items 
taken from randomly selected sources of the relevant population, (ii) the features 
y of the trace item and features of items taken from sources selected randomly in 
the relevant population, (iii) features of pairs of items taken from sources selected 
randomly in the relevant population (i.e., without using x and y). Formally, this 
amounts to defining the distribution in the denominator as follows: 


(Gi) g(AQ@, y) | x, M), 
Gi) g(AG@, y) | y, M), 
Gii) (AQ, y) | M). 


See, e.g., Hepler et al. (2012) for a discussion of this topic. 


Example 1.3 (Image Comparison) Consider a hypothetical case where the 
face of an individual is captured by surveillance cameras during the com- 
mission of a crime. Available screenshots are compared with the reference 
image(s) of a person of interest. For image comparison purposes, the evidence 
to be considered is a score given by the distance between the feature vectors 
x of the known reference and the evidential recording y (see Jacquet and 
Champod (2020) for a review). Consider the following competing proposi- 
tions. Hı: The person of interest is the individual shown in the images of the 
surveillance camera, versus Hj: An unknown person is depicted in the image 
of the surveillance camera. To help specify the probability distribution of the 
score in the numerator, one can take several pairs of images from the person 
of interest to serve as pairs of questioned and reference items. To inform the 
probability distribution for the score in the denominator, conditioning on the 
reference item x (i.e., the images depicting the person of interest) can be 
justified as it may contain information that is relevant to the case and may 
be helpful for generating scores (Jacquet & Champod, 2020; Hepler et al., 
2012). The distribution in the denominator can thus be computed using a 
source-anchored approach as in (i). The sBF can therefore be obtained as 


(continued) 
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Example 1.3 (continued) 


_ RAG, y) |x, Hi) 


sBF = = E 
g(A(x, y) | x, M2) 


In other types of forensic cases, conditioning on y in the denominator, case (ii), 
may be more appropriate. This represents an asymmetric approach to defining the 
distribution in the numerator and in the denominator. 


Example 1.4 (Handwritten Documents) Consider a case involving handwrit- 
ing on a questioned document. Handwriting features y on the questioned 
document are compared to the handwriting features x of a person of interest. 
The similarities and differences between x and y are measured by a suitable 
metric (score). To inform about the probability distribution of the scores in 
the numerator, one can take several draws of pairs of handwritten characters 
originating from the known source to serve as recovered and control items 
and to obtain scores from the selected draws. Under H2, consideration of 
x is not relevant for the assessment. Note that here H> is the proposition 
according to which the person of interest is not the source of the handwriting 
on the questioned document, but someone else from the relevant population. It 
would then seem reasonable to construct the distribution for the denominator 
by comparing the features y of the questioned document with features x 
from items of handwriting of persons randomly selected from the relevant 
population of potential writers. This amounts to a trace-anchored approach 
as in situation (ii) defined above. In fact, for handwriting, the approach (i) 
would amount to discarding relevant information related to the questioned 
document. The sBF can therefore be obtained as 


_ AAG, y) |x, Hi) 
&l(A (x, y) | y, m) 


In yet other cases, the distribution in the denominator may be obtained by 
comparing pairs of items drawn randomly from the relevant population, without 
conditioning on either x or y. In such cases, the alternative proposition H is that 
the two compared items come from different sources. 
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Example 1.5 (Firearm Examination) Consider a case in which a bullet is 
found at a crime scene and a person carrying a gun is arrested. The extent 
of agreement between marks left by firearms on bullets can be summarized 
by a score or multiple scores. An example of a simple score is the concept of 
consecutive matching striations. To inform the distribution in the numerator, 
the scientist fires multiple bullets using the seized firearm. To inform the 
distribution in the denominator, the scientist fires and compares many bullets 
known to come from different guns (i.e., different relevant models). The 
distribution in the denominator can thus be computed using a non-anchored 
approach. The sBF can therefore be obtained as 


SBF = ACN y) |x, ay 
g(A(x, y) | Mh) 


Note that this is a coarse approach in the sense that no consideration is given 
to general manufacturing features. Indeed, the amount and quality of striation 
on a bullet may depend on aspects such as the caliber and the composition 
(e.g., jacketed/non-jacketed bullets, etc.), hence a conditioning on y may be 
considered. 


Another example for a non-anchored approach, in the context of fingermark 
comparison, can be found in Leegwater et al. (2017). An example will be presented 
in Sect. 3.3.4. 

Note that the above considerations refer to so-called specific-source cases. In 
such cases, recovered material is compared to material from a known source. 
However, there are also other situations where the competing propositions are as 
follows: 


Hı: The recovered and the control material originate from the same source. 
Hy: The recovered and the control material originate from different sources. 


For such common-source propositions, the sampling distributions under the 
competing propositions can be learned, under Hj, from many scores for known 
same-source pairs (with each pair drawn from a distinct source) and, under (H2), 
from many scores for pairs known to come from different sources. The score-based 
BF in this case will account for the occurrence of the observed score under the 
competing propositions, but it does not account for the rarity of the characteristics 
of the trace. 

While a score-based approach has the potential to reduce the dimensionality of 
the problem, the use of scores implies a loss of information because features y and 
x are replaced by a single score. Therefore, there is a trade-off to be found between 
the complexity of the original configuration of features and the performance of the 
score-metric, the choice of which requires a justification. 


22 1 Introduction to the Bayes Factor and Decision Analysis 


For a critical discussion of score-based evaluative metrics, see Neumann (2020) 
and Neumann and Ausdemore (2020). See also Bolck et al. (2015) for a discussion 
of feature- and score-based approaches for multivariate data. 


1.6 Bayes Factor in the Investigative Setting 


While the use of the Bayes factor for evaluative purposes is rather well established 
in both theory and practice, the focus on investigative settings still offers much room 
for original developments. In many forensic settings, especially in early stages of an 
investigation, it may be that no potential source is available for comparison. In such 
situations, it will not be possible to compare characteristics observed on recovered 
and reference materials, as would be the case in an evaluative setting (Sect. 1.5). 
Nevertheless, one can derive valuable information from the recovered material 
alone. Consider, for example, two populations denoted pı and p2, respectively, and 
the following two propositions: 


Hı: The recovered item comes from population pı (e.g., a population of females). 
Hy: The recovered item comes from population p2 (e.g., a population of males). 


Denote by y the measurements on the recovered material known to belong to one 
of the two populations specified by the competing hypotheses, but it is not known 
which one. For such a situation, the Bayes factor measures the change produced by 
the measurements y on the recovered item in the odds in favor of H1, as compared 
to H2, when going from the prior to the posterior distribution. 

Assume that a parametric statistical model {f(- | 0),@ € ©} is suitable for 
the data at hand. The problem of discriminating between two populations can then 
be treated as a problem of comparing statistical hypotheses, assuming that the 
probability distribution for the measurements on the recovered material (under each 
hypothesis) is of a given form. Consider, first, the situation where the parameters 
characterizing the two populations are known, that is, 0 = 01 if the recovered item 
comes from population pı and 6 = 62 if the recovered item comes from population 
p2. Formally, this amounts to specifying the probability distributions f(y | 61) and 
f | 42), respectively. The posterior probability of the competing propositions can 
be computed as in (1.5) and the Bayes factor simplifies to a ratio of likelihoods as 
in (1.6). 


Example 1.6 (Fingermark Examination) Consider a case involving a single 
fingermark of unknown source. The fingerprint examiner seeks to help with 
the question of whether the mark comes from a man or woman. Thus, for 
investigative purposes, the following two propositions are of interest: 


(continued) 
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Example 1.6 (continued) 
Hı: The fingermark comes from a man. 
H: The fingermark comes from a woman. 


A type of data that can be acquired from fingermarks is ridge width, 
summarized in terms of the ridge count per surface in mm”. See, for example, 
Appendix A of Champod et al. (2016) for a summary of different data 
collections. Consider ridge density, which was found to vary as a function 
of sex (i.e., women tend have narrower ridges than men), but also between 
populations. Suppose that normality represents a reasonable assumption for 
ridge density so that the probability distribution for available measurements 
can be considered Normal N(6;, ar), with the unknown mean 0 being equal 
to 6; and the variance g? being equal to ar if H; is true. Given H4, the 
measurements y thus have a probability distribution N(@1, o?) and given H 
a probability distribution N(62, 03). 

The posterior probability of the competing propositions can be computed 
as in (1.5), and the Bayes factor simplifies to a likelihood ratio as in (1.6), that 
is, 


_ NG | 41,07) 


BF = So 
N(y | 62, 07) 


Generally, however, the parameters, or some of the parameters, characterizing the 
two distributions are unknown and a pair of probability density distributions will be 
introduced to model this uncertainty. As a consequence, the Bayes factor will also 
depend on prior assumptions and will not simplify to a likelihood ratio. Consider the 
case where parameters 6; are continuous and take values in the parameter space 0}. 
A prior distribution 7 (6; | pi) must be specified, with 6; € ©; and p; representing 
the population of interest. A marginal distribution for each population p; can be 
computed as in (1.2), 


mos J FO 166, | pias, (1.25) 


and the Bayes factor will take the form of a ratio of marginal likelihoods as in (1.3), 
that is, 


Fe fm O) 
fm) 


(1.26) 
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Example 1.7 (Fingermark Examination—Continued) Recall Example 1.6 
where a Normal probability distribution was assumed for the measured ridge 
density on a fingermark, with variance known and equal to Oe A conjugate 
prior distribution may be introduced for the population mean 6;, say 0; ~ 
N(ui, tê). The marginal likelihoods are still Normal with mean equal to the 
prior mean u; and variance equal to the sum of the prior variance a? and the 
population variance Ce The Bayes factor therefore is 


_ NỌ |u t? +02) 


BF = 7 z: 
NO | m2, 7% + 07) 


The same idea can be extended to the case where both the mean and the 
variance are unknown. This will be addressed in Sect. 4.3.2. 


The Bayes factor thus depends on the prior assumptions about parameters 
characterizing each population. This must not be confused, as noted earlier, with 
prior probabilities for competing propositions. The latter will form the prior odds 
which will be multiplied by the Bayes factor to compute the posterior odds 


Pr(H; | y) = fmo) r Pr(H1) 
Pr(Aa |y) fm) — Pr(A2) 


The Bayesian approach for discriminating between two propositions regarding 
population membership can be easily generalized to the case where there are any 
number k (>2) of competing mutually exclusive propositions. Let Hj,..., Hk 
denote k propositions and denote by y the observation to be evaluated. The 
propositions of interest can be defined as follows: 


Hı: The recovered item comes from population 1 (p1). 
Hy: The recovered item comes from population 2 (p2). 


Hy: The recovered item comes from population k (px). 


Example 1.8 (Questioned Documents) Consider a case involving questioned 
documents where the issue of interest is which of three printing machines has 
been used to print a questioned document. Propositions of interest are: 


Hı: The questioned documents have been printed with printer 1. 
Hy: The questioned documents have been printed with printer 2. 


(continued) 
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Example 1.8 (continued) 
H3: The questioned documents have been printed with printer 3. 


After having specified a Bayesian statistical model for each proposition 
(i.e., a probability distribution for the available measurements and a prior 
distribution for the unknown parameters), the marginal likelihoods fy,(y), 
i = 1, 2, 3, characterizing propositions H1, H2, and H3, can be obtained as in 
(1.25). 


Occasionally, cases involve multiple propositions. Imagine a case involving DNA 
findings, such as bloodstains recovered on a crime scene, with the reported profile 
being compared against the profile of a person of interest. The defense argues that 
the bloodstain does not come from the person but from either a relative (e.g., a 
brother) or an unknown person. A question that may arise in such a case is how to 
evaluate and report results, because the Bayes factor involves pairwise comparisons. 
One option is to report only the marginal likelihoods fy, (y), even if they may not be 
easy to interpret. Alternatively, one may report a scaled version f Hi (y) as suggested 
by Berger and Pericchi (2015), that is, 


SH; (y) 


ee (1.27) 
ee fH; (y) 


fho) = 


This expression will be much easier to interpret, because the scaled likelihoods 
fh, (y) sum up to 1. Generally, prior probabilities Pr(H;) may vary between 
recipients of such reports, but the posterior probability can be easily computed as 


Pr(H;) fh, 
Pr(H; | y) = LANEN pajeak 


Eia P(E SAO) 


followed, if required, by classification of the recovered material in the population 
with the highest posterior probability. Note that reporting the scaled version in (1.27) 
is equivalent to assuming equal prior probabilities for competing propositions. In 
fact, if Pr(H;) = ie i = 1,..., k, then it can easily be shown that 


fao) 


Pr; | y) =, 
Jai fh, Q) 


= fh, O), i=l,...,k, 


as ia fA 0) = 1. 

The analyst may also consider the possibility of summarizing several proposi- 
tions into one, in order to produce a comparison between two propositions regarding 
population membership. One of these propositions will be composite. Let Hy = 
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H U --- U Hg. Starting from k possible populations from which the recovered 
material may come from, a pair of competing propositions of interest may thus be 
formulated as follows: 


Hı: The recovered item comes from population 1 (p1). 


Hı: The recovered item comes from one of the other populations (p2,..., Px). 


The marginal likelihood fp, (y) characterizing proposition H is obtained as in 
(1.25), while the marginal likelihood under H; is 


k 
fa) = Yo Prp) f! FO 1); | pds 
i=2 Ji 


with DE 1 Pr(p:) = 1. The Bayes factor expressing the value of y for comparing 
H; against H; becomes 


_ fm O) Dia Prva) 
fao) l 


BF 


(1.28) 


The posterior odds become 


Pr(H1 |y) _ fm O) Pr(pi) 
Pr(Ä; | y) fao 


(Aitken et al., 2021, p. 643). 


Example 1.9 (Questioned Documents—Continued) Consider the following 
propositions: 


Hı: The questioned documents have been printed with printer 1. 


Hı: The questioned documents have been printed with printer 2 or with 
printer 3. 
The marginal likelihood characterizing proposition H; is 
fino) = [FO 1x06 | pide. 
O1 
The marginal likelihood characterizing proposition H; will become 


fa, 0) = Pr(pr) i fO | 62) (6 | pado 
2 


(continued) 
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Table 1.1 Scale for verbally 
expressing support provided 
by the observations for one 


BF Evidence in favor of Hı 


1 to 3.2 Not worth more than a bare mention 


hypothesis over an alternative 3.2to10 Substantial 
adapted from Jeffreys (1961) 10to 100 Strong 
>100 Decisive 


Table 1.2 Verbal scale for expressing evidential value, in terms of the Bayes factor, in support of 
the prosecution’s proposition over the alternative (defense) proposition (Willis et al., 2015) 


Value of the BF Verbal equivalent: The forensic findings ... 

1 Do not support one hypothesis over the other 

2 to 10 Provide weak support (for the first hypothesis relative to the 
alternative) 

10 to 100 Provide moderate support (idem) 

100 to 1000 Provide moderately strong support (idem) 

1000 to 10,000 Provide strong support (idem) 


10,000 to 1,000,000 Provide very strong support (idem) 
1,000,000 and above Provide extremely strong support (idem) 


Example 1.9 (continued) 
Hen) l E Vas 
03 


and the Bayes factor can be obtained as in (1.28). 


1.7 Bayes Factor Interpretation 


The Bayes factor is a coherent measure of the change in support that the findings 
provide for one hypothesis against a given alternative (Jeffrey, 1975). Table 1.1 
shows a guide for expressing Bayes factors verbally, following Jeffreys (1961). A 
historical review is presented in Aitken and Taroni (2021). 

The verbal equivalent must express a degree of support for one of the propo- 
sitions relative to an alternative and is defined from ranges of Bayes factor values. 
Qualitative interpretations of the Bayes factor have also been proposed in the context 
of forensic science (Evett, 1987, 1990; Evett et al., 2000; Nordgaard et al., 2012; 
Willis et al., 2015). Table 1.2 summarizes an example of a scale given in the ENFSI 
Guideline for Evaluative Reporting in Forensic Science (Willis et al., 2015), inspired 
by the scale proposed by Nordgaard et al. (2012). Users of these scales must be 
aware that labelling several Bayes factor apportionments offers a broad descriptive 
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statement about standards of evidence in scientific investigation and not a calibration 
of the Bayes factor (Kass, 1993). See, e.g., Ramos and Gonzalez-Rodriguez (2013), 
van Leeuwen and Briimmer (2013) and Aitken et al. (2021) for an account of 
calibration as a measure of performance of BF computation methods. 

Moreover, it is important to note that the choice of a reported verbal equivalent is 
based on the magnitude of the Bayes factor and not the reverse. Marquis et al. (2016) 
present a discussion on how to implement a verbal scale in a forensic laboratory, 
considering benefits, pitfalls, and suggestions to avoid misunderstandings. 

It is worth to reiterate that a Bayes factor represents a measure of change in 
support rather than a measure of support, though the two expressions may be 
perceived as equivalent. In fact, the Bayes factor can be shown to be a non- 
coherent measure of support: a small Bayes factor means that the data will lower 
the probability of the hypothesis of interest relative to its value prior to considering 
the evidence, but it does not imply that the probability of this hypothesis is low. The 
Bayes factor measures the change produced in the odds, thus providing a measure 
of whether the available findings have increased or decreased the odds in favor of 
one proposition compared to the alternative (Bernardo & Smith, 2000). 


1.8 Computational Aspects 


The computation of Bayes factors can be challenging, especially when the marginal 

likelihoods in the numerator and in the denominator (1.2) involve integrals that do 

not have an analytical solution. Several methods have been proposed to address this 

complication. See Kass and Raftery (1995) and Han and Carlin (2001) for a review. 
Consider the following general expression for the marginal likelihood: 


fQ)= f f(x | 0x (0)d0. (1.29) 


If the likelihood f(x | 0) and the prior (0) are not family conjugates, then 
an analytical solution may not be available. But suppose that it is possible to 
draw values from the prior distribution z(-). The integral in (1.29) can then be 
approximated by Monte Carlo methods as 


N 
i= >" fE | oPyn, (1.30) 


i=l 


where 0®, i = 1,..., N, are N independent draws from z(-). This is the average 
of the likelihood of the sampled values. An example will be provided in Sect. 2.2.2 
(Example 2.3). 

This simulation process can be rather inefficient when the posterior distribution is 
concentrated, relative to the prior, as most of the 0 ©) will have a small likelihood and 


1.8 Computational Aspects 29 


the estimate fi (x) in (1.30) may be dominated by a few values with large likelihood. 
The precision of the Monte Carlo integration can be improved by importance 
sampling (Kass & Raftery, 1995). Moreover, statistical packages (e.g., in R) allow 
one to sample from a certain number of distributions. 

Importance sampling as well as other Monte Carlo tools may help to overcome 
such difficulties as there is no need for the distribution 7 (0) to be available in closed 
form. Consider any manageable density 2*(@) from which it is feasible to sample. 
The integral in (1.29) can then be approximated by importance sampling as 


El wif | 0%) 
pe Wi 


frlx) = (1.31) 


where 6“ are independent draws from 2*(@) and are weighted by importance 


weights w; = 2(0)/2*(0). The function 7*(@) is known as importance 
sampling function (e.g., Geweke, 1989). An example will be provided in Sect. 2.2.2 
(Example 2.3). 


In the case where zr* (6) is taken to be the posterior density z (6 | x) = z (0) f (x | 
0)/f (x), the use of this expression in (1.31) yields the harmonic mean of the 
sampled likelihood values as an estimate for the marginal likelihood f (x): 


N 


=i 
A 1 1 
DE p> Fol =| : 


i=l 


Note that, whatever method is used, the output of such a simulation procedure is an 
approximation that must be handled carefully. Notwithstanding, it is worth pointing 
out that while the Monte Carlo estimate is not exact, the Monte Carlo error (e.g., 
f@)- Ô 1(x)) can be very small if a sufficiently large number of draws are generated. 
A study of Monte Carlo errors for the quantification of the value of forensic evidence 
is provided by Ommen et al. (2017). 

Many practical problems require more advanced techniques based on Markov 
chain Monte Carlo methods (MCMC) to overcome computational hurdles. The 
general idea behind these methods is to sample recursively values 6® from some 
transition distribution that depends on the previous draw @—) in such a way that at 
each step of the iteration process, we expect to draw from a distribution that becomes 
closer (i.e., converges) to the target posterior distribution z (0 | x). This means that, 
for many iterations, 9 is approximately distributed according to 7 (0 | x) and can 
be used like the output of a Monte Carlo simulation algorithm. To avoid the effect 
of starting values, the first set of iterations is generally discarded (this is called the 
burn in period), and the simulated values beyond the first np iterations 


got) ee Qi) 
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are taken as draws from the target posterior distribution. The Gibbs sampling 
algorithm is a well-known method to construct a chain with these features. Suppose 
that the parameter vector can be decomposed into several components, say 0 = 
(01, ..., 0p), and let (6; | gf) ) denote the so-called full conditional distribution, 
that is the conditional distribution of 6; at step (i) given all the other components, 
say 6_;, at the previous step (i — 1) 

Ce SG alee a ey eee Os 

For many problems, it is possible to sample easily from the conditional distribu- 
tions, as is the case when distributions are conjugate. The Gibbs sampling algorithm 
works as follows: start with an arbitrary value 96 = 0, sei ay”) and generate 
ay at each iteration according to the conditional distribution given the current 
values cre Examples will be given in Sects. 3.4.1.3 (Example 3.14) and 3.4.3 
(Example 3.16.) 

Whenever it is not possible to decompose the joint distribution in manageable 
conditionals, one can implement an alternative approach, the Metropolis—Hastings 
(M-H) algorithm (e.g. Gelman et al., 2014). This algorithm can be summarized as 
follows. Start with an arbitrary value 90 = 0, heey oD) and generate o0 at 
each iteration, as follows: i 


1. Draw a proposal value o? "P form a density qo”, a), called candidate 
generating density. 
2. Compute a probability of acceptance as follows: 


z (or) q (a, a) 
- (ed) q (af, ae) 


3. Accept the proposed value enn? with probability a (o, ae) , and set a. = 
@Prop f 
p> 


a Gi <> ep”) = min (1.32) 


otherwise, reject the proposed value and set i = a", 


Note that if the candidate generating function is symmetric (e.g., a Normal 
probability density), the acceptance probability in (1.32) simplifies to 


m (ar) 
a (a, a) = min } —— 
j j (ef 
le a 


The performance of an MCMC algorithm can be monitored by inspecting graphs 
and computing diagnostic statistics. Such exploratory analysis is fundamental for 
assessing convergence to the posterior distribution. An example will be given in 
Sect. 2.2.2 (Example 2.6). 
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The output of the MCMC algorithm can be used to provide the marginal 
likelihood that is needed for the numerator and the denominator of the Bayes 
factor, as proposed by Chib (1995) for a Gibbs sampling algorithm and by Chib 
and Jeliazkov (2001) for an M-H algorithm. The key idea is to obtain the marginal 
likelihood f(x) by a direct application of Bayes theorem since it can be seen as the 
normalizing constant of the posterior density 


fœ | 6*)2(6*) 
fœ) = rO (1.33) 
where 6* is a parameter value with high posterior density. Note that (1.33) is valid 
for any parameter value 6 € ©. The likelihood f(x | 0) and the prior density 7 (0) 
can be directly computed at a given parameter point 6*. The posterior density 7 (0 | 
x) is unavailable in closed form, but it can be approximated by using the output of 
the Gibbs sampling. Consequently, the marginal likelihood can be approximated as 


oy _ £1026") 
f(x) = RO lx) (1.34) 
Examples will be given in Sects. 3.4.1.3 (Example 3.14) and 3.4.3 (Example 3.16). 

This short overview of computational tools is not intended to be exhaustive. 
There are instances, for example when dealing with high-dimensional distributions, 
where the simulation process is very slow, giving rise to inefficiencies in the 
behavior of the Gibbs sampler or Metropolis algorithm. An alternative solution 
is given by the Hamiltonian Monte Carlo (HMC) method, where the proposal 
distribution is not centered on the current position of the chain and changes depend 
on the current position of the chain. This allows one to obtain more promising 
candidate values, avoiding to get stuck in a very slow exploration of the target 
distribution and therefore to move much more rapidly (Neal, 1996). As in any 
Metropolis algorithm, the HMC proceeds by a series of iterations, though it requires 
more efforts in terms of programming and tuning. The user can refer to a computer 
program called Stan (sampling through adaptive neighborhoods) to directly apply 
the Hamiltonian Monte Carlo method. The reader can refer to Gelman et al. (2014) 
and Stan Development Team (2021) for instructions and examples. A complete 
picture of basic and more advanced methods of Bayesian computation can be found, 
e.g., in Gelman et al. (2014), Marin and Robert (2014), and Robert and Casella 
(2010). The reader can also refer to Han and Carlin (2001) and to Friel and Pettitt 
(2008) for a review of methods to compute BFs. 

In all examples in this book, dealing with the Gibbs sampler and the Metropolis— 
Hastings algorithm, we will directly program the computations in R. Other open- 
source programs however exist that can be used to build Markov chain Monte 
Carlo sampler, such as Stan or Jags (Just another Gibbs sampler, https://mcmce-jags. 
sourceforge.io/). They both can interact with R (see libraries RStan, rjags and 
runjags). Further examples can be found in Albert (2009) and Kruschke (2015). 
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1.9 Bayes Factor and Decision Analysis 


The Bayes factor provides a coherent and quantitative way for relating probabil- 
ities for states of nature, before information is obtained, to probabilities given 
information that has become available. A subsequent step, the choice between 
different hypotheses, represents a problem of decision-making (Lindley, 1985). For 
the purpose of illustration, consider the simple and regularly encountered case where 
only two hypotheses are of interest, say Hı and H2. The two hypotheses represent 
the list of, more generally, n exclusive and exhaustive uncertain events (also called 
states of nature) and denote the entirety of nature. The decision space is the set of 
all possible actions, here decisions dı and d2, where decision d; can be formalized 
as the acceptance of hypothesis H;. The decision problem can be expressed in terms 
of a decision matrix (see Table 1.3) with C;; denoting the consequence of deciding 
di when hypothesis H; is true. Decision d; is called “correct” if hypothesis H; is 
true and i = j. Conversely decision d; is not correct if hypothesis H; is true and 
i Æ j,ie., H~i is true. When choosing between competing hypotheses, one takes 
preferences among decision consequences into account, in particular among adverse 
outcomes. This aspect is formalized by introducing a measure for expressing the 
decision maker’s relative desirability, or undesirability, of the various decision 
consequences. To measure the undesirability of consequences on a numerical scale, 
one can introduce a loss function L(-), where L(C;;) denotes the loss that one 
assigns to the outcome of deciding d; when hypothesis H; is true. 

If it can be agreed that a correct decision represents neither a loss nor a gain, the 
loss function for a two-action problem can be described with a two-way table that 
contains zeros for the losses L(C;;), i = j, and the value J; for L(C;;), i # j. Such 
a “0 — l;” loss function is shown in Table 1.4, where l; = L(d;, H~i) denotes the 
loss one incurs whenever decision d; is a wrong decision. 

The relative (un-)desirability of available decisions can be expressed by their 
expected loss EL(-), computed as 


Table 1.3 Decision matrix with dı and dz denoting the possible actions, Hı and H> denoting the 
states of nature, and C;; denoting the consequence of deciding d; when hypothesis H; is true 


Hı H 
dı Cu Ci2 
dy Cr C22 


Table 1.4 The “O—/;” loss function for a decision problem with dı and dọ denoting the possible 
actions, Hı and H; denoting the states of nature, and l; denoting the loss associated with adverse 
decision consequences 
A, H 
dı 0 l 
dz h 0 
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EL(dj | x) = L(d;, Hj) Pr( Hi; | x) + L(dj, Hai) Pr(H-; | x) 
n —— n~ 
0 Qi li Qi 


= la-.j, 


where x denotes the observation or a series of measurements and œ~; denotes the 
(posterior) probability of the event H~; given x. The formal Bayesian decision 
criterion is to accept hypothesis H; if the expected loss of the decision to accept Hı 
is smaller than the expected loss of rejecting it, that is, if the (posterior) expected 
loss of decision dı is smaller than the (posterior) expected loss of decision d2: 


EL(d, | x) < EL(d | x) 
lig < hai. (1.35) 
When rearranging the terms in (1.35) to æ«1/œ2 > lı/l2, and dividing both sides by 


the prior odds 71/72, the Bayes decision criterion states that accepting H is the 
optimal decision whenever 


ay /0r2 ` h/h 
m1/%2  mı/m2 


This is equivalent to accepting Hı whenever the Bayes factor in favor of this 
proposition is larger than a constant c determined by the prior odds and the loss 
ratio. Given a set of observations, the Bayes factor is computed and, depending on 
whether or not it exceeds a given threshold, the decision maker chooses between the 
members in the list of states of nature (here Hı and H2). Examples will be given 
in Chap. 3 in the context of inference of source (Sect. 3.3.3) and in Chap. 4 in the 
context of classification (Sects. 4.2.2 and 4.4.1.2). An extended review of elements 
of decision analysis in forensic science can be found in Taroni et al. (2021b). 

This decision criterion is simple and intuitive, yet it poses challenges. For 
example, the requirement to choose a prior probability for the two hypotheses may 
be discomforting, because there is no ad hoc recipe for this purpose. In principle, 
probabilities are personal, since they depend on one’s knowledge (Lindley, 2014). 
They may change as information changes and may vary among individuals. For 
example, a given hypothesis may be considered almost true by one individual, 
but far less probable by someone else. The fact that different individuals with 
different knowledge bases may specify different probabilities for the same event, 
provided that they are accompanied with a justification, is not a problem in principle 
(Lindley, 2000). The only strict requirement to which probability assignments ought 
to conform is coherence (de Finetti, 2017). Coherence has the normative role of 
encouraging people to make careful assignments based on their personal knowledge. 
This can be operationally supported by the concept of scoring rules. See, for 
example, Biedermann et al. (2013, 2017a) for a discussion of scoring rules in the 
context of forensic science. 
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The same viewpoint applies to utility and loss functions, which may be difficult 
to specify. A “correct” utility (or loss) function does not exist, because preference 
structures are personal. Adverse decision consequences may be considered more 
or less undesirable, depending on the background, the context and the decision 
maker’s objectives (e.g., Taroni et al., 2010). Moreover, the loss function does 
not need to have constant values, such as the “0 — l;” loss function introduced 
above. More general loss functions treat the loss as a function of the severity of the 
consequences. Examples will be given in Chap. 2 regarding inference and decision 
about a proportion (Sect. 2.2.3) and about a mean (Sect. 2.3.3). 

Note that, in the context here, the terms “personal” and “subjective” do not 
mean that the theory is arbitrary, unjustified or groundless (Biedermann et al., 
2017b; Taroni et al., 2018). There are various devices for the sound elicitation of 
probabilities and the measurement of the value of decision consequences (Lindley, 
1985). What matters in a situation in which a decision maker is asked to make a 
choice among alternative courses of action that have uncertain consequences is that 
the behavior is one that can be qualified as rational. This includes, in particular, a 
coherent specification of the loss function, reflecting personal preferences among 
consequences in terms of desirability or undesirability. 

This formal decision-analytic approach provides decision criteria that (i) are 
based on clearly defined concepts, (ii) promote rational decision-making under 
uncertainty, and (iii) make a clear distinction between the evaluation of the strength 
of evidence (as given by the Bayes factor), which is the domain of the forensic 
scientist, and the specification of the threshold with which the Bayes factor is 
compared, i.e., the ratio between the loss ratio and the prior odds. The latter lies in 
the domain of the recipient of expert information, such as investigative authorities 
and members of the judiciary. 


1.10 Choice of the Prior Distribution 


Bayesian model builders may encounter various difficulties. One of them is the 
choice of the prior distribution. Bayes theorem does not specify how one ought to 
define the prior distribution. The chosen prior distribution should, however, suitably 
reflect one’s prior beliefs. In this context, so-called vague or non-informative prior 
distributions may help to find a broad consensus. However, it is important to keep 
in mind that even a “non-informative” prior distribution effectively conveys a well- 
defined opinion, i.e., that probabilities spread uniformly over the parameter space 
(de Finetti, 1993a). In contrast to this, personal or so-called informative priors aim 
at encoding available prior knowledge. Whenever feasible, it is advantageous to 
choose a member of the class of conjugate distributions, that is, a family of prior 
distributions such that for any prior in this family and a particular probability 
distribution, the corresponding posterior distribution will be in the same family. 
For example, the beta distribution and the binomial distribution are said to be 
conjugate in this sense. Several examples will be provided throughout this book. 
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Table 1.5 Some common conjugate prior distribution families 


Probability distribution Conjugate prior distribution 
Binomial: Beta: 

f(x | 6) = Bina, 0) (0) = Be(a, p) 
Multinomial: Dirichlet: 

F(x1,---5 Xk | O1,..-, Oe) = Mult(n, 01, ..., Ox) f(O1,..-, 0k) = Dir(aq, ..., a) 
Poisson: Gamma: 

f(x | A) = Pn) m(A) = Gala, p) 

Normal (known variance): Normal: 

f(x | 6,07) = NO, o°) (0) = N(u, 1°) 

Normal (known mean): Inverse Gamma: 

f(x | 0,07) =N(@, 0”) z(o?) = IG(aq, 8) 


Table 1.5 provides a list of some common families of conjugate distributions. A 
more extensive list can be found in Bernardo and Smith (2000). Despite such smooth 
technical options, eliciting a prior distribution may not be easy. 

First, it may be that none of the standard parametric families mentioned above 
is suitable to describe one’s prior degree of belief. There may be circumstances 
where multimodal priors may better reflect the available knowledge, and mixture 
priors would be more convenient (see e.g. Taroni et al., 2010). Another option is to 
specify prior beliefs over a selection of points and then interpolate between them 
(Bolstad & Curran, 2017). More generally, there may be cases where the choice 
of a conjugate prior is not appropriate as it does not properly reflect available 
knowledge. If this is the case, the application of Bayes theorem may lead to a 
posterior distribution that is analytically intractable. Such situations require the 
implementation of computational tools as described in Sect. 1.8. 

Second, practitioners will immediately realize that even if the choice of a given 
standard parametric family may appear justifiable, they will still need to choose 
a member from the selected family. Stated otherwise, they will need to fix the 
hyperparameters of the prior distribution in a way that the resulting shape will 
reasonably reflect their knowledge. Assume that practitioners are in a situation 
where, based on their experience in the field, they can summarize and translate 
their prior beliefs into a numerical value for the prior mean, say m, and into a 
numerical value for the prior standard deviation, say s. They can then find the values 
of the parameters that specify a prior distribution that reflects the assessed prior 
location and prior dispersion, respectively. For example, suppose that the parameter 
of interest, 9, is a proportion and that a beta prior distribution is chosen to model 
prior uncertainty, i.e., 0 ~ Be(a, 6). The problem then is how to choose a and $£. 
If one can specify a value m for the prior mean and a value s for the prior standard 
deviation, that is the two values describing the location and the shape of the prior 
distribution, one can elicit the hyperparameters œ and $ by relating the assessed 
prior mean and prior variance to the prior moments of a beta distributed random 
variable, that is, 
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(1.36) 


~ “@tptbatBpy er) 


The hyperparameters of the beta prior can then be obtained by solving the two 
equations in (1.36) and (1.37) for œ and 6 


= i z 1 (1.38) 


S 


m= 1 (1.39) 


p= m| 3 


It is advisable to inspect the prior distribution thus elicited. Producing a graphical 
representation can help examine whether the shape of the distribution reasonably 
reflects one’s prior beliefs. Moreover, the so-called equivalent sample size ne should 
be calculated in order to examine the reasonableness of the amount of information 
that underlies the proposed prior; one should make sure that it is not unrealistically 
high (Bolstad & Curran, 2017). Stated otherwise, one should examine whether the 
information that is conveyed by the prior is equivalent, at least roughly, to the 
information that would be obtained by collecting a sample of equivalent size ne. 
For example, consider a random sample (X1,..., Xn,) of size ne, providing the 
same information that is conveyed by the prior. The sample mean X = 4 paar Xi 
should have, at least roughly, the same location and the same dispersion as the prior. 

For the beta-binomial case, the equivalent sample size ne can be obtained by 
relating the moments of the beta prior to the corresponding moments characterizing 
a random sample of size ne from a Bernoullian population with probability of 
success 0: 


a 
= 1.4 
a+ p i aa 
œf deel (1.41) 
(a+ B+ D + p)? Ne 


Solving for ne, one obtains ne = a + $ + 1. If this is felt to be unrealistic, then 
one should revise one’s prior assessments, increase the dispersion and recalculate 
the prior. Otherwise, one might specify too much information about the proportion 
0 relative to the amount of information provided by a sample of size ne. 
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Example 1.10 (Elicitation of a Beta Prior) Suppose that a prior distribution 
needs to be elicited for the proportion 0 of non-counterfeit merchandise (e.g., 
medicines) in a target population. It is thought that the distribution is centered 
around 0.8 with a standard deviation equal to 0.1. Parameters œ and 6 can be 
elicited as in (1.38) and (1.39) 


m= 

S=0). i 

a=m-* (m (1-m) /s*2-1) 

b= (1-m) + (mx (1-m) /s*2-1) 
c(a,b) 


vvvv Vv 


(ab ae a 
Figure 1.1 shows the elicited beta prior Be(12, 3). 


> plot (function(x) dbeta(x,a,b),0,1,xlab=expression 
+ (paste(theta)),ylab=expression (paste (pi) x 
+ paste('(')xpaste(theta) *spaste(')'))) 


The equivalent sample size is 12+3+1=16. This is the size of the sample that 
should be available in terms of information that is equivalent to that conveyed 
by the elicited prior. 


Fig. 1.1 Prior distribution 
Be(12, 3) over 6 in i 
Example 1.10 
oO | 
cS 
BoM 
oe 
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An objection to this procedure might be that while specifying a value for the 
location of the prior may be feasible, this may not necessarily be so for the 
dispersion. In many cases, the available prior knowledge takes the form of a 
realization (x,,..., Xn) of a random sample of size n from a previous experiment. 
In this case, it is sufficient to solve (1.40) and (1.41) with respect to œ and £ for this 
sample size n: 


a = p(n —1), (1.42) 
B=(U-p)a-1), (1.43) 


where 0 has been estimated by the sample proportion 6 = p = 3.1 xi/n. One 
can immediately verify that whenever the hyperparameters œ and £ are elicited as 
in (1.42) and (1.43), thena + 6+ 1 =n. The elicited parameters reflect the amount 
of information provided by a sample of size n. 

Some further practical examples will be provided throughout the book. For an 
extended discussion of prior elicitation, the reader can refer to Garthwaite et al. 
(2005) and O’ Hagan et al. (2006). 


1.11 Sensitivity Analysis 


In Sect. 1.4, it has been emphasized that the Bayes factor is not a measure of the 
relative support for the competing propositions provided by the data alone. The 
Bayes factor is influenced by the choice and the elicitation of the subjective prior 
densities (probabilities) for model parameters under propositions Hı and H2. This 
reflects background knowledge that may be available to analysts. For this reason, 
prior elicitation of model parameters must not be confused with prior probabilities 
of the propositions of interest. 

While the computation of the Bayes factor requires prior assessments about 
unknown quantities, a main objection to the choice of such prior distributions 
is that they may be hard to define, in particular when the available information 
is limited. Situations characterized by an abundance of relevant data that can be 
used to construct a prior distribution may be rare. Generally, the choice of a prior 
is the result of a subtle combination of relevant information, published data, and 
explainable personal knowledge of the expert. The specification of the prior must be 
taken seriously, because it can be shown that even when a large amount of evidence 
is available, the marginal likelihood is highly sensitive to the choice of the prior 
distribution, and so is the Bayes factor (Gelman et al., 2014). This is different for 
the posterior distribution that is dominated by the likelihood. 

Sensitivity analyses allow one to explore how results may be affected by changes 
in the priors (e.g. Kass & Raftery, 1995; Kass, 1993; Liu & Aitkin, 2008). This, 
however, may turn out to be computationally intensive and time consuming. An 
alternative approach has been proposed by Sinharay and Stern (2002) for comparing 
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nested models, though it can be extended to non-nested models. The general idea 
is to assess the sensitivity of the Bayes factor to the prior distribution for a given 
parameter 0 by computing the Bayes factor for a vector of parameter values (or 
a grid of parameter values in the case of a two-dimensional vector parameter 0). 
The result is a graphical representation of the Bayes factor (i.e., a sensitivity curve) 
as a function of 0, say BFg. In this way, one can get an idea about the Bayes 
factor one could obtain for different values of 0, and thus about the sensitivity 
of the Bayes factor to various prior distributions. These prior distributions have 
their mass concentrated on different apportionments of the parameter space. For 
one or two-dimensional problems, the inspection of a sensitivity curve represents 
a straightforward and effective approach to study the impact of varying parameter 
values on the BF under consideration. An example is given in Sect. 2.3.1 for the 
choice of the prior distribution about a Normal mean. A sensitivity analysis with 
respect to the prior probability assessments of competing propositions is provided 
in Sect. 3.2.3. 

A further layer of sensitivity analyses relates to the choice of the utility/loss 
function. An example is presented in Sect. 2.2.3 for the choice of the loss function in 
the context of inference and decision about a population proportion. Section 4.4.1.2 
gives an example for the investigation of the effect of different prior probabilities 
and loss values in the context of classification of skeletal remains. 

A sensitivity analysis for Monte Carlo and Markov chain Monte Carlo proce- 
dures is presented in Sects. 2.2.2 and 3.4.1.3. In Sect. 4.3.3, a sensitivity analysis is 
developed for the choice of a smoothing parameter in a kernel density estimation. 


1.12 UsingR 


R is a rich environment for data analysis and statistical computing. In its base 
package, it contains a large collection of functions for exploring, summarizing, 
and representing data graphically, handling many standard probability distributions 
and more. R includes a simple programming language that users can extend with 
new functions. Some basic instructions on the use of R or of particular functions 
are available from the R Help menu, by using the command help.start(). The 
reader can refer to, for example, Verzani (2014) for a detailed introduction to the 
use of R for descriptive and inferential statistics, to Albert (2009) for an overview 
of elements of Bayesian computation with R, and to the R project home page 
(https://www.r-project.org) for more references. Datasets and routines used in the 
examples throughout this book are available on the website of this book (on http:// 
link.springer.com/). 

Generally, we will give results of R computations as produced directly by R. We 
do not make any recommendations as to the level of precision that scientists should 
use when reporting numerical results. 


40 1 Introduction to the Bayes Factor and Decision Analysis 


Published with the support of the Swiss National Science Foundation (Grant no. 
10BP12_208532/1). 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 2 A 
Bayes Factor for Model Choice gag 


2.1 Introduction 


The Bayes factor can assist forensic scientists in the evaluation of findings when 
recipients of expert information need help in discriminating between propositions 
concerning, for example, a parameter of interest. A typical example is the discrimi- 
nation between competing propositions regarding the concentration of a controlled 
substance (e.g., drugs in blood) with respect to a given threshold. This chapter will 
approach one-sided hypothesis testing involving model parameters in the form of a 
proportion (Sect. 2.2) and a mean (Sect. 2.3). In both situations, additional factors, 
such as errors (Sects. 2.2.2 and 2.3.2), are considered. Aspects of decision-making 
are also considered (Sects. 2.2.3 and 2.3.3). 

Throughout this chapter, the Bayes factor will be obtained as a ratio of marginal 
likelihoods following the ideas described in Sect. 1.4. The greater marginal likeli- 
hood will support the respective proposition against the other. This, along with other 
aspects, such as the decision maker’s preferences among adverse consequences, has 
an impact on the decision-making process. 


2.2 Proportion 


A common problem in forensic practice is the investigation of the proportion of 
items or individuals that present a characteristic of interest, e.g., the proportion 
of seized pills containing a controlled substance or the proportion of counterfeit 
medicines in a given population. A consignment of items is considered a random 
sample from a super-population of items of the same type, and the parameter 0 is the 
proportion of units in the super-population that present the target characteristic. Note 
that for consignments of very large size (i.e., several thousands), a finite number of 
units will correspond to each positive value of 0. For consignments of small size 
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(i.e., smaller than 50), the parameter 6 is a nuisance parameter (i.e., one that is not 
of primary interest) that can be integrated out, leaving a probability distribution for 
the unknown number of items having the target characteristic. For consignments of 
intermediate size, 0 can be treated as a continuous value in the interval (0, 1) (e.g., 
Aitken et al., 2021). As an example, consider the following pair of propositions: 


Hı: The proportion 0 of items having the characteristic of interest is larger than 
60. 

H: The proportion @ of items having the characteristic of interest is smaller than 
or equal to 4, 


where 6 € (0, 1) is a given threshold of legal interest.! Note that applications of 
this type of propositions are broad and include, for example, quality control of food 
(and other consumer products), the analysis of levels of contamination of laboratory 
equipment, and the extent of environmental pollution. 

This section covers three main topics: (1) inference about an unknown proportion 
0 (Sect. 2.2.1), (2) inference about 6 when background elements may affect the 
counting process (Sect. 2.2.2), and (3) decision regarding competing propositions 
about 0 (Sect. 2.2.3). 


2.2.1 Inference About a Proportion 


Consider a case of inference about a population parameter based on a sample of 
size n. Aitken (1999) and Aitken et al. (2021) discuss how to choose a sample size. 
Suppose that among the n items, x shows a characteristic that is of interest from a 
legal point of view. The question then is how such an analytical result supports one 
or the other of the competing propositions regarding the proportion of items in the 
population that have the target characteristic. 

Experiments of this kind can be regarded as Bernoulli trials (after the Swiss 
mathematician Jacob Bernoulli, 1654-1705), where trials are independent and 
give rise to one of the two mutually exclusive outcomes, conventionally labeled 
success and failure, with constant probability of success in each trial. The binomial 
distribution Bin(n, 0) is a statistical model for data that arise from a sequence of 
Bernoulli trials: 


faino = ("Jord =a SS U1 teat 
X 


In the Bayesian perspective, the most common prior distribution for the parameter 
of interest 0 is the beta distribution Be(«, 8): 


' See Biedermann et al. (2012, 2018) for a general discussion of thresholds of legal interest when 
data are continuous. 
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f(@|a, B) = 0%! — 68! /Bra, B), 0<8 <1; «æ,ß>0, 


with B(a, 8) = 4 ry. 

The beta prior distribution and the binomial likelihood are conjugate (see 
Sect. 1.10): after inspecting a sample, one can easily compute the posterior 
distribution, which is still beta, Be(a*, 6*) with parameters updated according to 
well-known updating rules, a* = œ +x, 6* = B+n—x (e.g., Lee, 2012). The prior 
odds, the posterior odds, and the Bayes factor can be easily computed, as discussed 
in Sect. 1.4, by means of standard routines. 


Example 2.1 (Counterfeit Medicines) Consider a case in which a large batch 
of medicines (say, N > 50) is seized, suspected to contain counterfeit items. 
The following propositions are of interest: 


Hı: The proportion 6 of counterfeit medicines is greater than 0.2. 
Hy: The proportion @ of counterfeit medicines in not greater than 0.2. 


Suppose that, initially, limited information is available so that a uniform prior 
distribution is chosen over the interval (0, 1), that is, 9 ~ Be(1, 1). Note that 
although a prior distribution Be(1, 1) is often called uninformative, it is in 
fact informative (see Sect. 1.10 and de Finetti (1993b)). It conveys the view 
that every value of 0 in the interval (0, 1) is considered equally probable. The 
prior odds can then easily be obtained. 


(elm=(0) , 2 

asi 

p= 
pil=pbeta(th,a,b,lower.tail=F) 
prior odds=pil/(1=-pil) 

prior odds 


MOM VOV MOV 


[1] 4 


A uniform prior distribution clearly favors, a priori, hypothesis H4, that 0 is 
greater than 0.2. Next, suppose that a sample of size 40 is analyzed and 12 
out 40 items are found to be positive (counterfeit). The posterior distribution 
follows immediately and so the posterior odds and the Bayes factor. 


n=40 

Seed 2 

astar=at+x 

bstar=b+n-x 

alphal=pbeta(th,astar,bstar, lower.tail=F) 


vvvv Vv 


(continued) 
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Example 2.1 (continued) 
> post_odds=alphai/(1-alpha1) 
> post_odds 


MI i}, 19594 


The posterior probability of proposition Hı is, therefore, approximately 18 
times greater than the posterior probability of the alternative proposition M2. 
Thus, the Bayes factor can be obtained as 


> BF=post_odds/prior_ odds 
SEBE 


[1] 4.548985 


The Bayes factor indicates that the evidence is in favor of proposition Hı 
that the proportion of counterfeit medicines is greater than 0.2, rather than 
proposition M> (i.e., 9 < 0.2). According to the verbal scale presented in 
Table 1.2, the BF weakly supports proposition Hı over H2. 


To help specify the prior distribution, information in the form of data regarding 
similar consignments from cases with comparable circumstances may be used. 
Such data may suggest a distribution other than the uniform distribution used in 
the above example. An example of how to elicit a subjective prior distribution 
about a proportion is provided in Sect. 1.10. For a more extensive discussion about 
prior elicitation for a proportion, the reader can refer to O’ Hagan et al. (2006). 
Forensically relevant applications of prior elicitation for 0 are discussed in Aitken 
(1999). Note, however, that in certain practical applications, analytical results may 
be affected by further factors that cannot be dissociated from the observational 
process. An example for such a factor is considered is Sect. 2.2.2. 

The analysis pursued above focused on the problem of inference about a 
proportion for a large batch. Consider now the case where the size N of the 
consignment is small (less than 50). Suppose a sample of size n is inspected and x 
items are found to present the target characteristic (e.g., yield a positive test result), 
so that 0 ~ Be(a + x, 8 +n — x). Denote by Y the unknown number of positive 
items in the uninspected part of the consignment. This random variable has still a 
binomial distribution, Y ~ Bin(m, 0), where m = N — n represents the number 
of units that have not been inspected. The probability distribution for the unknown 
number of positive units can be obtained by integrating out parameter 0. This turns 
out to be a beta-binomial distribution Be-Bin(n, m, x, a, B): 
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Pr(Y = y | n,m, x, a, B) 


_FP@tatAGo+xtaratm=x-ytB) 
= Paxetarn—-x+p)rn+m+a4 B) (y =0,1,...,n) 
(2.1) 


(Aitken, 1999). 


Example 2.2 (Counterfeit Medicines—Small Consignment) Consider Exam- 
ple 2.1 and suppose now that the consignment is small, say N = 40. 
Suppose further that a sample of size n = 10 has been inspected and that 
2 items are found to be counterfeit. Starting from a uniform prior distribution 
0 ~ Be(1, 1), the beta posterior distribution becomes 0 ~ Be(3, 9). 


N=40 

mi= nO 

XE? 

ac 

lo=al 
astar=at+x 
bstar=b+n-x 
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The distribution of Y then is Be-Bin(10, 30, 2, 1,1). The probability to 
observe a given number of counterfeit items (e.g., y = 1) in the remainder 
of the consignment can be obtained using the function dbbinom that is 
available in the package ext raDistr (Wolodzko, 2020). 


> library (extraDistr) 
> dbbinom(1,N-n,astar,bstar) 


[1] 0.03665943 


One can also use the function pbbinom that allows to compute the cumula- 
tive distribution function for the beta-binomial random variable in (2.1). For 
example, the probability to observe at most 2 counterfeit items can be obtained 
as 


> pbbinom(2,N-n,astar,bstar) 
[1] 0.109604 


A Bayesian network for inference about a proportion of a small consignment 
has been developed in Biedermann et al. (2008). Posterior probabilities for 0 
can easily be calculated with such models. 
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2.2.2 Background Elements Affecting Counting Processes 


In many real-world applications, counting processes performed in forensic laborato- 
ries cannot be considered error-free. Examinations may be affected by inefficiencies 
and perturbing factors. For example, it may be that items are lost or missed during 
counting or that background elements are present, i.e., objects observationally indis- 
tinguishable from the target objects. This section addresses inferential challenges 
due to such background elements. 

Suppose that x is the number of recorded successes, i.e., the number of times that 
the target characteristic is detected. However, the number x may not correspond 
to the number xs of items actually showing the characteristic of interest but be 
affected by a certain number of background elements, xp, that are wrongly counted 
as successes. This complication may typically arise in applications where the items 
of interest are small particles. Consider, for example, the assessment of rice quality 
in a context of food quality control. Rice quality can be measured by means of 
several features, such as the percentage of cracked or immature grains. For example, 
there may be legal provisions regarding the maximum tolerated amount of cracked 
grains.” It might then be of interest to compare alternative propositions according 
to which the percentage of cracked grains is above or below a given regulatory 
threshold. A key question is how to conduct such a comparison when the counting 
process may be affected by background elements, e.g., oil seeds in the example here. 

While the number of elements actually showing the target characteristic is 
modeled as the outcome of a binomial distribution, Xs ~ Bin(n, 0), the amount 
of background elements affecting the counting process, xp, can be modeled by a 
Poisson distribution, Xp ~ Pn(A), where A is the mean number of background 
elements (D’ Agostini, 2004). The total number of recorded successes is therefore 
X = Xs + Xp. The graphical model (see e.g. Cowell et al., 1999) in Fig. 2.1 offers 
a schematic representation of the probabilistic relationship among the variables. 


Fig. 2.1 Graphical 
representation of the O (n) G) 
statistical model for inference 


about a population proportion 
based on count data in 


presence of background 
elements affecting counting 
processes 


? For legislation in, e.g., Italy, see Gazzetta Ufficiale della Repubblica Italiana, 6, 09-01-2018, 
Decreto 20 settembre 2017. 


2.2 Proportion 47 


It can be shown? that X has the following probability distribution: 


x 


f@|n,0,aA) = >. (, Aa = G) * 4% eTA" fp! 


xp=0 


Recall that prior uncertainty about 6 can modeled by a beta distribution Be(a, 6). 
The posterior distribution is then given by 


x ( n or (1 _ OX XD OHA XD [xp 10% 1 (1 _ 6)b-1 


Xp=0 \x—xp 


f@|n,x,à)= Ta n DBEA , (2.2) 
where the normalizing constant f(x | n, A) in the denominator is 
f(x [nay = J f(x |n, 0,1) f(0)d0. (2.3) 


The posterior distribution (2.2) cannot be obtained in closed form as the integral 
characterizing the normalizing constant f(x |,n,à) is not tractable analytically. 
However, since it is possible to draw values from the beta distribution, the integral 
in (2.3) can be computed by Monte Carlo approximation as in (1.30), that is, 


N 
falnA= iG |n,0, a), (2.4) 


i=l 


where 0 ~ Be(a, B). 


Example 2.3 (Rice Quality) Consider a consignment of rice and suppose that 
it is of interest to assess whether the proportion of cracked grains is below a 
given level of tolerance. The following competing propositions may be of 
interest: 


Hı: The proportion 6 of cracked grains is greater than 0.025. 
H: The proportion 6 of cracked grains is smaller than or equal to 0.025. 


In a sample of 1000 grains, a total of 28 cracked grains are observed. 


(continued) 


3 The method for finding the distribution of a sum of random variables is given, for example, in 
Casella and Berger (2002). It can be used to extend the model to the case of missing counts, an 
aspect that is not treated here. 
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Example 2.3 (continued) 
= im (00) 
> S268) 


The beta prior distribution for 0 needs to be elicited. Suppose that available 
knowledge indicates that it is implausible that the proportion of cracked 
grains is greater than 5%. An asymmetric prior distribution with a prior mass 
concentrated over values lower than 0.05 can be elicited as follows. Start with 
a = l and £ = 1, then increment £ by 1 until the shape of the beta distribution 
is such that Pr(@ > 0.05) is small, e.g., equal to 0.1. 


azl 

=i 

while (pbeta(0.05,a,b,lower.tail=F)>0.1) {b=b+1} 
c(a,b,pbeta(0.05,a,b,1lower.tail=F) ) 


VOM V V 


[1] 1.00000000 45.00000000 0.09944026 


The parameters œ and ĝ can thus be set equal to 1 and 45, respectively. 
Figure 2.2 (left) can be obtained with 


> plot (function (x) dbeta(x,a,b),0,0.1,xlab=expression 
+ (theta),ylab=expression (paste (pi)»paste('(!')* 
+ paste (theta) »paste(!')!'))) 


The prior odds can now be computed in a straightforward manner. 


cat=- 0 025 
pil=pbeta(th0,a,b,lower.tail=F) 
prior odds=pil/(1-pil) 

prior odds 


vvvyv 


[1] 0.4706802 


This value, approximately 0.5, means that the probability of hypothesis H2 is, 
a priori, approximately 2 times greater than the probability of hypothesis H,. 

Suppose that when inspecting a sample of 1000 rice grains, on average, 1 
grain (e.g., oil seed) is wrongly counted as cracked. Parameter A can thus be 
taken to be equal to 0.001. 

First, we write a function dbinpois that computes the product between 
a binomial likelihood Bin(n, 0) at x — xp and a Poisson likelihood Pn(A) at 
Xp. 


> dbinpois=function (xb) { 
+ dbinom((x-xb),n,theta) »dpois (xb, lambda) } 


The unnormalized posterior distribution in (2.2) 


(continued) 
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Example 2.3 (continued) 
IB SI a I MU ee 
Bia, B) 


is computed as 


lambda=0.001 
xb=matrix (seq(0,x,1),nrow=1) 
aner- ONONO 
thetav=seq(0.0001,0.9999,incr) 
theta=thetav/1] 
s=sum (apply (xb, 2,dbinpois) ) 
upost=dbeta (theta,a,b) xs 
for (i in 2:length(thetav)) { 
theta=thetav/i] 
s=sum (apply (xb,2,dbinpois) ) 
upost=c (upost, dbeta (theta, a,b) xs) 


} 


The normalizing constant f(x | n, A) can be approximated as in (2.4) 


a> or Gp as WW OW WY VY MV y 


theta=rbeta(1,a,b) 
norm_const=sum (apply (xb, 2,dbinpois) ) 
nn=10000 
ee (2 im Asmam) 
theta=rbeta(1,a,b) 
s=sum (apply (xb,2,dbinpois) ) 
norm_const=norm_const+s 


} 


norm_const=norm_const/nn 


WSs Ss ap ae WY OW OW WY 


and the approximated posterior density, represented in Fig. 2.2 (right), can be 
obtained as 


normpost=upost/ (norm_const) 

plot (thetav,normpost, xlab=expression (paste (theta)), 
ylab=expression (hat (f) spaste('(') «paste (theta) « 
paste('/n,x,') «paste (lambda) xspaste(')')), 

sell tins (O, O. i) , eyoes 7") 


t+ +4 VV 


To calculate the BF, we need to obtain the posterior probabilities of the competing 
propositions Hı and H2. Consider proposition H2. The (approximate) posterior 
probability of proposition Hz can be obtained by Monte Carlo integration as 
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Fig. 2.2 Left: beta prior distribution Be(1, 45) of the unknown proportion @ of cracked grains 
(Example 2.3). Right: approximated posterior distribution of 6, f(@ | n,x,A). The gray shaded 
area shows the posterior probability of the hypothesis Hı (9 > 0.025) 


Oo 
n, à) Jo 


00 


nary NZ 


6 
: f(x | n,0,r) f(0)d0 
n,X) Jo 


60 1 
fx |n, 0, à) f()—dé 
Oo 


1 


N 
XO fx ln, 0t, A) f0a0, (2.5) 
i=l 


where 6! is sampled from a uniform distribution in the interval (0, 6o), gi ~ 
Unif(O, 09), and the normalizing constant f(x | n, A) is approximated as in (2.4). 
The (approximate) posterior probability of hypothesis Hı is 1 — @. The (approxi- 


mated) BF will be 


@j (03 


: 2.6 
7/12 eo) 


BF = 


Example 2.4 (Rice Quality—Continued) Consider the scenario described in 
Example 2.3, and compute the (approximate) posterior probability of the 
hypothesis H2: the proportion 0 of cracked grains is smaller than or equal 


to 0.025 (as in 2.5). 


(continued) 
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Example 2.4 (continued) 

S m= ooo 

theta=runif (m 0, tho) 

alpha2=mean (rowSums (apply (xb, 2,dbinpois) ) 
*dbeta (theta,a,b)) *th0/norm_const 

alpha2 


Wa WW 


[2] 0.30753 


The (approximate) posterior probability of hypothesis Hı then is @ = 
0.6925. This is highlighted by the gray shaded area in Fig. 2.2 (right). The 
posterior odds and the BF therefore are 


> post_odds=(1-alpha2) /(alpha2) 
> post_odds 


(al 2.aSil71s 


> BF=post_odds/prior_ odds 
SEBE 


[1] A: 78S959 


The Bayes factor indicates that the evidence favors hypothesis H4, i.e., 0 > 
0.025, over Ho, i.e., 9 < 0.025. A BF of approximately 5 provides limited 
support for the hypothesis H4. Note that the results obtained by the laboratory 
analyses clearly affect our belief about 0. The analytical results change prior 
odds in favor of H, (0.47) to posterior odds of approximately 2.25 in favor of 
A. 


2.2.2.1 Sensitivity to Monte Carlo Approximation 


The Monte Carlo estimate of the Bayes factor obtained in (2.6) is subject to 
variability, which may be a source of concern. Figure 2.3 provides an illustration of 
BF variability. The figure shows 500 realizations of the BF approximation in (2.6). 


ns=500 
m=10000 
BFs=0 
dbinpois=function (xb) { 
dbinom((x-xb),n,theta) *dpois (xb, lambda) } 
for (j in 1:ns){ 

rthetav=rbeta(m,a,b) 


+ v+VvV VV Vv 
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norm_const=0 
for (i in 1:m){ 
theta=rthetav[i] 
s=sum (apply (xb, 2,dbinpois) ) 
norm_const=norm_const+s 
} 
norm_const=norm_const/m 
theta=runif (m,0,th0) 
alpha2=mean (rowSums (apply (xb, 2,dbinpois) ) 
*dbeta (theta,a,b))*th0/norm_const 
post_odds=(1-alpha2) /alpha2 
BFs=c (BFs,post_odds/prior_odds) 


tet t Het tte ttt 


+} 

> BFs=BFs[-1] 

> hist (BFs,main='', prob=T) 

> curve (dnorm(x,mean(BFs),sd(BFs)),1wd=2, add=T) 


3.5 
| 


3.0 


Density 


1.5 


1.0 


0.5 
i 


0.0 
L 


BFs 


Fig. 2.3 Histogram of 500 realizations of the BF approximation in (2.6), where the posterior 
probability of hypothesis Hz is obtained as in (2.5). The solid line represents the fitted Normal 
density 


The purpose of the graphical representation in Fig. 2.3 is to illustrate that the 
repeated application of the procedure leads to a distribution of BFs. While the Monte 
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Carlo estimate is not an exact value, it can be shown that the approximation error can 
be made arbitrarily small by generating a sufficiently large amount of observations. 
For a large number of simulations, it can also be proven, by Central Limit Theorem, 
that the error | Ô (x) — f(x) | VN is normally distributed. This can be used to 
analyze the variability of the Monte Carlo estimate (see, e.g., Marin and Robert 
(2014)). Note that the shape of the histogram is roughly symmetric and bell-shaped, 
as shown in Fig. 2.3. 

It is worth noting that other, more efficient ways than traditional Monte Carlo 
methods may be implemented to compute the integrals related to the posterior 
probabilities of the competing hypotheses. Importance sampling (see Sect. 1.8), 
for example, can improve the integral approximation. It can also be used when 
the target density is unnormalized. Consider again the posterior probability of 
hypothesis H: 


— [P f£@ 11,8 NfO) 


a2 = dé. 


0 f(x | n,d) 


This can be rewritten as 


1 i g(0) 
=-—— hro FEOL 
= saw fe OPE | n, 8, IO) ay 


1 1 
a h(@ 0)2(0)d0, 
anl ORORO 


where 


1if 0 <0 < 0o 
h(0) = 
Oif% <0 <1, 


w(@) = f(x |n,0,à)f(0)/2(0) and g(@) is the importance sampling function. 
The posterior probability a2 can be approximated as 


~ FEN he )we') 


a2 = f 
$ X; wo’) 


(2.7) 


where 6! ~ g(0). 
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Example 2.5 (Rice Quality—Continued) A Be(20, 780) is chosen as impor- 
tance sampling function g(@). It can be readily verified that it is centered at 
0.025 and that the density rapidly collapses toward zero for values greater than 
0.04. This will avoid the generation of points for which the integrand is close 
to zero, with a very modest contribution to the approximation. Next, sample 
10000 values from this distribution. 


> m=10000 

> al=20 

= JILO 

> theta=rbeta(m,ail,bi1) 


The posterior probability w2 of hypothesis H2 can be obtained as in (2.7) 


fx=rep (0,m) 

fx[theta<th0o] =1 

num=mean (rowSums (apply (xb, 2,dbinpois) ) * 
dbeta (theta,a,b) /dbeta (theta, al1,b1) *f£x) 
den=mean (rowSums (apply (xb, 2,dbinpois) ) * 
dbeta (theta,a,b) /dbeta (theta, al1,b1)) 
alpha2=num/den 

alpha2 


MOM se WA SE WA AV) OW? 


[1] 0.3079344 


> BF=((1-alpha2) /alpha2) /prior_odds 
> BF 


[1] 4.774886 


Figure 2.4 provides an illustration of BF variability. Notice that while the BFs 
in Figs. 2.3 and 2.4 have roughly the same location, the importance sampling 
in (2.7) produced an increase in precision. 


It is important to understand that the resulting distribution does not mean 
that there is a distribution for a given BF because the BF, by definition, is a 
single number. See, e.g., Taroni et al. (2016) and Biedermann et al. (2017a) for 
discussions of this topic among forensic statisticians and forensic scientists. The 
error resulting from the implementation of numerical techniques is an important 
source of information about which the scientist should be transparent. Following 
ideas presented in Tanner (1996), recently reconsidered by Ommen et al. (2017) in 
a forensic context, the numerical precision in the overall approximated value can be 
estimated by the associated Monte Carlo standard error. 
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Fig. 2.4 Histogram of 500 realizations of the BF approximation in (2.6), where the posterior 
probability of hypothesis Hz is obtained as in (2.7). The solid line represents the fitted Normal 
density 


2.2.2.2 Unknown Expected Value of the Number of Background Elements 


It is important to note that, contrary to what was developed in Example 2.3, the 
expected value à of the number of background events is generally unknown. The 
uncertainty about A can be modeled by means of a gamma distribution, à ~ 
Ga(a, b). The marginal posterior distribution of parameter 6, written f (0 | n, x), 
now takes a more complicated form as one needs to handle the joint posterior 
distribution that is proportional to 


f@,à | n, x) 
a =À} 3 
aP 7 jea- ome 9211 — g)f-1aa- leb, 
om 7% xp! 
= 


(2.8) 


Following ideas described in Taroni et al. (2010), a two-block M-H algorithm 
(Sect. 1.8) can be implemented in order to draw a sample from the joint posterior 
distribution in (2.8). For each block, the candidate generating density is taken to be 
Normal with the mean equal to the current value of the parameter and the variance 
chosen so as to obtain a good acceptance rate (Gamerman & Lopes, 2006). 
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Consider the parameter 6 first. The full conditional density of 6 is proportional 
to 


x : 
Ax 
fi@ |A,n, x) « > (, "ema _ oy mte or a _ @)b-1 


Xp=0 


Starting from the current value for 0, say 6~—!), a candidate value 6°"? for @ can 
be obtained as 


yy Prop 


e 
prop — ~ prop ~ (i—1) 72 
0 = Then where y N(y tẹ). 


= i=!) 
and y“—)) = log (en 


in the interval (0, 1). The candidate value 6?°P is accepted with probability 


| In this way, the proposed value 9P'°P will be defined 


(i—1) „prop _ m; Ae A) 
(pi), yP = min Ty 


where f(y | A) is the reparametrized full conditional density of parameter 0 and 
can be obtained as 


oY oY 
fh 1A) = opi (oe [acm 


See, e.g., Casella and Berger (2002) for distributions of functions of random 
variables. 

If the candidate 6P'°P is accepted, it becomes the current value of the chain, i.e., 
6 = OPP: otherwise 0 = 67D, 

The second block refers to parameter à. The full conditional density of parameter 
À is proportional to 


x 


n eTa 
fro |O,n, x) « > ( jea orta an! eo. 
X—Xp Xp! 


Xp=0 


Starting from the current value for A, say AË), a candidate value AP" for A can 
be obtained as 


prop i-1) _2 
APOP — eh where pP ~ N (9 ) 3) p 


and @@—)) = log ACD. In this way, the proposed value AP"°P will be defined in the 
interval (0, oo). The candidate value AP"°P is accepted with probability 
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ro (i-1) 
a(t? pr?) = min fı fP | ) | , 


where f(@ | 0) is the reparametrized full conditional density of parameter à and 
can be obtained as 


fp | 0) =e? f(e | 0,0, x). 


If the candidate APTP is accepted, it becomes the current value of the chain, i.e., 
AW = APP: otherwise AM = A67), 


The two-block M-H algorithm can be summarized as follows: 
Initialization: start with arbitrary values 0 and 1 
Iteration i: 


1. Given 6°) and ACD, 


— Generate 0P"°P according to fı (0 | A“~), n, x). 
— With probability a(@¢—), @PP) accept OPP and set 9 = PtP; 
otherwise reject OPP and set 9 = 007D., 
2. Given 0 and A@-), 
— Generate AP"? according to fo(A | 0, n, x). 
— With probability a(A@~!), APP) accept APP and set AY = APP; 
otherwise reject AP"? and set A? = A607), 


Return (9%). AM) and fAMeTD 1 AOD}, 
where np is the burn-in period and N is the number of iterations. 


Example 2.6 (Rice Quality—Continued) Consider again Example 2.3 where 
prior uncertainty about 6 was modeled by a Be(1, 45) distribution, and the 
parameter à was set equal to 0.001. For the purpose of the example here, a 
gamma distribution with parameters a = 2 and b = 1000 is used to model 
prior uncertainty about à. The prior density Ga(2, 1000) is shown in Fig. 2.5. 
It can be observed that the prior mass is concentrated at very small values of 
À. 


> n=1000 
= Ale 


(continued) 
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Fig. 2.5 Gamma prior 
distribution Ga(2, 1000) over 
A for A € (0, 0.01) 
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(continued) 
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Example 2.6 (continued) 

Before running the algorithm, it is useful to introduce the following functions: 
mh1 is used to obtain the candidate (current) value 6Pf°P (6°); mh2 is 
used to calculate the probability of acceptance of the candidate value OPP; 
dbinpois computes the product between a binomial likelihood Bin(n, 0) at 
x — xp and a Poisson likelihood at xp. 


> 
> 
> 
+ 


mhl=function (x) {x/(1+x) } 

mh2=function (x) {x/((1+x) *2) } 
dbinpois=function (xb) { 

dbinom( (x-xb) ,n, theta) *dpois (xb, lambda) } 


The MCMC algorithm is run over 15000 iterations, with a burn-in range of 
5000 iterations. 


tHe HEHEHE HEHEHE HEHEHE HEHEHE HEHEHE HEHEHE EVV 


n.iter=15000 

acct=n.iter 

accl=n.iter 

burn.in=5000 

iene (Si stil lam- Aee) 

psicurr=log (theta/(1-theta) ) 

s=sum (apply (xb, 2,dbinpois) ) 
pipsicurr=mh2 (exp (pSicurr) ) «dbeta(theta,a,b) xs 


# Generate the candidate value of parameter theta 


psiprop=rnorm(1,psicurr,tau/1]) 

theta=mhi (exp (psiprop) ) 

s=sum (apply (xb, 2,dbinpois) ) 

pipsiprop=mh2 (exp (psiprop) ) *dbeta(theta,a,b)xs 


# acceptance/rejection of the candidate value 
# (parameter theta) 


if (runif (1) >pipsiprop/pipsicurr) { 
theta=mhi1 (exp (psicurr) ) 
acct=acct-1} 

thetav=c (thetav, theta) 


# generate the candidate value of parameter lambda 


phicurr=log (lambda) 
s=sum (apply (xb, 2,dbinpois) ) 


(continued) 
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Example 2.6 (continued) 

piphicurr=exp (phicurr) *«dgamma (lambda, ag, bg) «s 
phiprop=rnorm(1,phicurr, tau[2]) 

lambda=exp (phiprop) 

s=sum (apply (xb, 2,dbinpois) ) 

piphiprop=exp (phiprop) *dgamma (lambda, ag, bg) *s 


# acceptance/rejection of the candidate value 
# (parameter lambda) 


if (runif (1) >piphiprop/piphicurr) { 
lambda=exp (phicurr) 

accl=accl-1} 

lambdav=c (lambdav, lambda) 


} 


c (accet/n.iter,accl/n.iter) 


Vttttete tee tte ete 


M 03102000 © .ASTASS3S 


These values represent the acceptance rates for 0 and à, respectively. 

The output of the simulation run is shown in Fig. 2.6, representing the 
trace-plot, the autocorrelation plot (showing the correlation structure of the 
sequences), and the histogram of the simulated draws for 0 (left column) and A 
(right column). The simulated draws have an acceptance rate of approximately 
31% for 0 and 30% for à. The trace-plots of simulated draws look like random 
noise and the autocorrelation decreases rapidly as the time lag at which it is 
calculated increases. 


> par (mfrow=c (3,2)) 

> plot (thetav, type='1',xlab='TIterations', ylab= 

+ expression (paste (theta) ),main=expression (paste 

+ (theta))) 

> plot (lambdav, type='1',xlab='TIterations', ylab= 

+ expression (paste (lambda) ),main=expression (paste 

+ (lambda) )) 

> act (thetav/-c(i:burn.in)],type="correlation", ci=0, 
+ main=expression (paste (theta)),ylab='') 

> acf (lambdav[-c(1:burn.in)],type="correlation",ci=0, 
+ main=expression (paste (lambda) ),ylab='') 

> hist (thetav[-c(1:burn.in) ],xlab=expression (paste 
+ (theta)),ylab='',main='') 

> hist (lambdav[-c(1:burn.in)],xlab=expression (paste 
+ (lambda) ),ylab='',main='') 


(continued) 
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Example 2.6 (continued) 

Note that the argument ci=0 in the function acf for computing and 
plotting the estimate of the autocorrelation function suppresses the plot of 
the confidence interval. 
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Fig. 2.6 MCMC diagnostic with trace-plots of simulated draws of 0 (top left) and 4 (top right), 
autocorrelation plots over the last 10000 iterations (center) and histograms over the last 10000 
iterations (bottom) 


The simulated values grt), ...,6%) can serve as draws from the posterior 
distribution fı(0 | à, n, x). The posterior probability of hypothesis H; can then be 
approximated as 
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a= D> OP /(N =n»), (2.9) 


0 >0.025 


and the BF can be obtained straightforwardly. 


Example 2.7 (Rice Quality—Continued) Using a burn-in range of 5000 iter- 
ations, the average value of parameter 0 over the last 10000 iterations can be 
computed as 


> thetahat=mean (thetav[-c(1:burn.in)]) 
> thetahat 


[L] OC, O2788516 
The posterior probability of hypothesis Hı can be approximated as in (2.9): 


> alphal=sum(thetav[-c(1:burn.in)]>th0)/ 
+ (n.iter-burn.in) 
> alphal 


[1] Oo a 


> post_odds=alphai/(1-alpha1) 
> post_odds 


[1] 2.448276 


Recall that the prior odds have been quantified previously as approximately 
0.47. The Bayes factor then is 


> post_odds/prior_odds 
MI 5.201569 


The uncertainty about the presence of background elements, modeled by À, 
modifies the value of the BF from approximately 4.77 to 5.2. This change 
is small. The BF still provides only weak support for the hypothesis H; that 
0 > 0.025, compared to Ad. 


2.2.3 Decision for a Proportion 


The normative framework for decision-making introduced in Chap. | is well suited 
for addressing problems of statistical inference presented in this chapter. Consider 
again a pair of competing propositions as defined in Sect. 2.2 regarding the question 
of whether the proportion of items showing a target characteristic of interest is 
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Fig. 2.7 Linear loss function o | 
L(d,, 8) (solid line) and i 
L(d2, 0) (dashed line) in 
(2.10) for 49 = 0.2, l = 1, és 
h = 1,0; = (0.2, 1), al 
© = (0, 0.2] 
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greater (H1) or not greater (H2) than a given threshold 69. From a decision-theoretic 
point of view, two courses of action are possible: dı and dz. Decision dı amounts 
to accepting the view that the proportion 0 is greater than a given (legal) threshold, 
8. Decision dz amounts to accepting the view that 6 is smaller than or equal to the 
threshold 09. A possible loss function L(-) for such a two-action decision problem 
is 


0 if € ©], (0) if € ©, 
Lid, 0) = L(d2, 0) = (2.10) 
lı (0o — 0) if 0 € @o. h (0 — 0o) if 0 € QO. 


This is a linear loss function where the loss is proportional to the magnitude of the 
error (e.g., 9 —9). An example is shown in Fig. 2.7, where 6) = 0.2, and loss values 
lı and h are equal to 1. 

Given this loss function, the Bayesian posterior expected loss for dı, that is 
accepting Hı : 0 > 4, is 


EL(d; | x) = f 


O 2 


110 f (6 | xd — Í LOF (O | x)d9, 


2 


where f(@ | x) = Be(a* = «œ + x, ßp* = B +n — x). Similarly, the Bayesian 
posterior expected loss for d2, that is accepting H2 : 6 < 6p, is 


Of (6 | x)d0 -Í L200 f (0 | x) dd. 


O1 


EL(d | x) = i 


01 
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After some algebra, it can be shown (Taroni et al., 2010) that 


a+x 
EL(d, | x) = 1,69 Pr(@ < 6o | a*, B*) — lı ———— Pr(6 < 6 | a* +1, 6“), 
a+B4+n 
(2.11) 
and 
a+x x * x Qk 
EL(d2 | x) = 2——— Pr(O > bo | æ” +1, 6”) —l200 Pr(O > | a”, 6”). 
a+B4+n 
(2.12) 


The decision criterion then is to decide dı (d2) whenever EL(d,) is smaller (greater) 
than EL(d2). 


Example 2.8 (Counterfeit Medicines—Continued) Recall Example 2.1 
where the competing propositions refer to the proportion of counterfeit 
medicines that may be either greater or not greater than a given limiting 
value, e.g., 0) = 0.2. Consider a uniform prior Be(1,1) for 0 and the 
finding that 12 out 40 items are positive. Consider a linear loss function as 
in (2.10), with 7; = 1 and /2 = 1. This is a symmetric loss, reflecting the 
idea that falsely deciding that the proportion is greater than the threshold is 
as undesirable, and hence as severely penalized, as falsely deciding that the 
proportion is smaller than the threshold. The expected losses of decisions dı 
and dọ are computed as in (2.11) and (2.12). 


choom 

aa 

l= 

n=40 

bea 2 

J ik= 

TE 

ax=(a+x) / (a+b+n) 
ELd1=1l1»th0»+pbeta (th0, a+x,b+n-x) - 
lisaxsxpbeta (th0, a+x+1,b+n-x) 
ELd2=12*axx«pbeta (th0,a+x+1,b+n-x, lower.tail=F) - 
12*thO*pbeta (th0,a+x,b+n-x, lower.tail=F) 
c (ELd1, ELd2) 


v+tv+vVvVV VV VY VV Vv 


[1] © OOLZO7984A 0. 110731793 
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Example 2.8 (continued) 

The optimal decision thus is dı, since it minimizes the expected loss. Given 
prior beliefs, the observed data, and personal loss assignments, the optimal 
course of action is to decide in favor of proposition H, according to which the 
proportion of counterfeit medicines is greater than 0.2. 


A decision maker may find a “0 — l;” loss function, as shown in Table 1.4, more 
appropriate. Consider again the case discussed in Sect. 2.2.1 where it was of interest 
to compare the hypotheses that the proportion of counterfeit medicines in a seizure 
was greater (H1) or not greater (H2) than a given threshold 69. In such a context, 
the loss lı (i.e., the loss incurred when deciding dı and H3 is true) could amount 
to the net loss represented by expenses incurred by issuing legal proceedings in a 
non-priority case (i.e., falsely considering 6 > 69). In turn, loss /2 could amount 
to monetary value of property that could have been confiscated by investigative 
authorities in a meritorious case. Following results in Sect. 1.9, the decision criterion 
becomes 


l L/l 
kabar Tl m men. 
a2 h 1/12 


Decision d4 is to be preferred to decision dp if and only if the posterior odds in favor 
of H; are greater than the ratio of the losses of adverse outcomes or, alternatively, if 
the BF is greater than the ratio between the loss ratio of adverse outcomes and the 
prior odds. 

Decision makers may find it difficult to assign losses lı and l2. Note, however, 
that when adverse outcomes are considered equally undesirable, then the loss ratio 
simplifies to 1, and the decision criterion becomes to decide dı whenever the 
posterior odds are larger than 1, i.e., the posterior probability of hypothesis Hı 
is greater than the posterior probability of hypothesis H2. In turn, when adverse 
consequences are not equally undesirable, a decision maker may consider how much 
more (less) undesirable one adverse outcome is compared to the other. This can be 
expressed as l1 = kip, i.e., by specifying how much worse deciding dı is when 
0 < Oo is true, compared to deciding dz when 0 > 6 is true (Biedermann et al., 
2016b). A sensitivity analysis can be performed for different values of k. 


2.3 Normal Mean 


Toxicology laboratories are frequently asked to quantify the amount of target 
substance (e.g., alcohol, illegal drugs, particular metabolites, etc.) in samples such 
as blood, urine, and hair in order to help assess whether an unknown target quantity 
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0 (e.g., the level of alcohol in blood) exceeds a given value (e.g., a legal threshold). 
Competing propositions of interest may be specified as follows: 


Hı: The target quantity 0 exceeds a given level 6o. 
Hy: The target quantity 0 is equal to or smaller than a given level 0. 


This section considers three main topics: (1) inference about an unknown quantity 
0 (Sect. 2.3.1), (2) inference about @ in presence of factors influencing the 
measurement process (Sect. 2.3.2), and (3) decision about competing propositions 
regarding 6 (Sect. 2.3.3). 


2.3.1 Inference About a Normal Mean 


Consider the hypothetical case of a person, Mr. X, stopped by traffic police because 
of suspicion of driving under the influence of a given substance (e.g., alcohol or 
THC). A blood sample is taken and a series of analyses are performed by a forensic 
laboratory. The propositions of interest may be, for example, that “The quantity 0 
of target substance in Mr. X’s blood exceeds the legal threshold 69” (H1) versus 
the alternative proposition “The quantity 0 of target substance in Mr. X’s blood is 
smaller than or equal to the legal threshold 69” (H2). A series of measurements x are 
obtained. It is often reasonable to assume that such measurements follow a Normal 
distribution N(6, 07): 


f@|0,0%= 


1 1 
exp | —-— (x — 6)” F 
V2n02 P| %7“ i | 
where the mean @ is the unknown quantity of target substance. The variance ø? can 
be approximated from previous ad hoc calibrations (see discussion by Howson and 
Urbach (1996)). The most common prior distribution for the Normal mean 6 is itself 
a Normal distribution N(u, T2): 


1 1 
fO |u, 1?) = | 57 0 wh. 


where the hyperparameters u and t? are often called prior mean and prior variance, 
respectively. 

The posterior distribution of the target quantity @ is still a Normal distribution, 
denoted N(ux, tj; because the Normal prior and the Normal likelihood are 
conjugate. Generalizing the updating formulae (1.19) and (1.20) to the case where 
a vector of n measurements (x1, ..., Xn) is available leads to 


o7/n fe T? _ 
= X 
an4" o?/n +T? 


Hx (2.13) 
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and 


2,2 

to-/n 

t = RAL (2.14) 
o-/n+T 

= n 

where x = )7_) x;/n. 
The posterior mean uy and the posterior variance t can be calculated by means 

of the function post_distr. 


> post_distr=function (sigma,n,barx,pm, pv) { 

+ postm= (pm*sigma/n+barxs«pv) / (sigma/n+pv) 
+ postv=(pv«*sigma/n) /(sigma/n+pv) 

+ op=c (postm, postv) 

+ return (op) } 


The prior odds, the posterior odds, and the Bayes factor can be easily computed, 
as discussed in Sect. 1.4, by means of standard routines (see Example 2.9). The 
case where the population variance ø? is unknown and a prior distribution must be 
specified for both parameters (6, a?) will be addressed in Sect. 3.3.2. 


Example 2.9 (Alcohol Concentration in Blood) A person is stopped by traf- 
fic police because of suspicion of driving under the influence of alcohol. Two 
measurements are obtained by the laboratory, 0.4866 g/kg and 0.5078 g/kg. 
The population variance ø? is known and is taken to be equal to 0.0237. 
Available information, e.g., the fact that the person has been stopped by 
traffic police while driving late in the night, exceeding the speed limit etc., 
suggests a prior mean equal to 0.8 and a prior variance equal to 0.157, say 
0 ~ N(u = 0.8, t? = 0.152%). This amounts to say that, a priori, values for 
the alcohol level in blood lower than 0.35 and larger than 1.25 are considered 
extremely implausible (prior probabilities for values outside this range are on 
the order of 0.01). 
The propositions of interest are the following: 


Hı: The alcohol level 0 in the blood of Mr. X exceeds the legal threshold 
o = 0.5 (0 > 0.5). 

Hy: The alcohol level in the blood of Mr. X is smaller than or equal to the 
legal threshold 6) = 0.5 (0 < 0.5). 


The prior odds can be easily computed as follows: 


(continued) 
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Example 2.9 (continued) 

iclnO=0),. 5 

pm=0.8 

prso 1S2 

pil=pnorm(th0,pm, sqrt (pv), lower. tail=F) 
prior _odds=pil/(1-pil) 

prior odds 
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[il] 422. 95579) 


The probability of hypothesis H; is, a priori, approximately 43 times greater 
than the probability of the alternative hypothesis H2. Consider now the effect 
of the measurements made on the blood sample. 


> x=c(0.4866,0.5078) 

= 62=0.023°2 

> postm=post_distr(s2,length(x),mean(x),pm,pv) [1] 
> postm 


[1] 0.5007182 


> postv=post_distr(s2,length(x),mean(x),pm,pv) [2] 
> postv 


[1] 0.0002614268 


The posterior distribution of the quantity of alcohol in blood @ is, therefore, 
N(0.5007, 3e — 04). The posterior odds are 


> alphal=pnorm(th0,postm, sqrt (postv),lower.tail=F) 
> post_odds=alphal/ (1-alpha1) 
> post_odds 


[1] 1.073465 
The ratio between posterior and prior odds gives the Bayes factor: 


> BF=post_odds/prior_odds 
D JB! 


[1] 0.02498999 


The probability to obtain the two measurements if Mr X’s alcohol level in 
blood does not exceed the legal threshold 0) = 0.5 is approximately 40 
times greater than given the proposition that the blood alcohol level is greater 
than the legal threshold. The evidence thus provides moderate support for the 
hypothesis H2, compared to H4. 
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2.3.1.1 Choosing the Parameters of the Normal Prior for the Mean 


If the experimenter has no reason to consider the distribution describing prior 
uncertainty about the unknown quantity 0 to be asymmetric, then a choice may 
be made in the family of Normal distributions. When choosing a member from this 
family, the analyst will need to assign a value to the prior mean u and a value to 
the prior standard deviation t. To elicit a Normal prior, it is useful to recall that for 
a Normal distribution 6 ~ N(w, t7), approximately 99.7% of values are within 3 
standard deviation from the mean, thus 


Pr{u —3t <0 <w+3t} © 0.997. 


Hence, if the practitioner can assign a measure of location u and a pair of values 
that define the upper and lower bounds of an interval that covers a range of plausible 
values of the unknown quantity 0, then the standard deviation can be assigned as 


— lup — b 


7 (2.15) 


where lup is the upper bound mentioned above. In Example 2.9, a prior location 
was fixed at u = 0.8. Moreover, prior probabilities for values smaller than 0.35 
and greater than 1.25 were extremely small (i.e., on the order of 0.01). The standard 
deviation has been elicited as in (2.15). 

It may be worth to inspect the reasonableness of the elicited prior. This includes, 
as highlighted in Sect. 1.10, producing a graphical representation to see whether the 
amount of available information is suitably conveyed. Consider a random sample of 
size ne from a Normal population providing an equivalent amount of information 
conveyed by the prior. The equivalent sample size ne can be found by matching the 
prior variance t° to the dispersion from the sample, o? /ne, and solving for ne. The 
smaller nz, the weaker will be prior beliefs, and the more the posterior distribution 
will be influenced by even a modest amount of data. Vice versa, the larger ne, the 
stronger will be the prior beliefs, and the more the posterior distribution will be 
dominated by the prior. Thus, more data will be necessary to make a substantial 
impact on prior beliefs. 

Whenever the state of information is such as to consider all possible values of 6 
equally plausible, a locally uniform prior can be defined: 


f(@) x constant. 
In the latter case, the posterior distribution of 0 is a Normal distribution centered at 


the sample mean x with spread parameter equal to o*/n (e.g., Bolstad & Curran, 
2017). 
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2.3.1.2 Sensitivity to the Choice of the Prior Distribution 


As noted in Sect. 1.11, the marginal likelihood is highly sensitive to the choice of 
the prior distribution and so is the Bayes factor. Thus, it should be emphasized that 
the BF obtained in Example 2.9, the value 0.02, does not depend on the data alone. 
It also depends on the choice of the prior distribution on 8. 

For the purpose of illustration, consider a sensitivity analysis for the hyperpa- 
rameters that characterize the prior distribution for the unknown level of alcohol in 
blood. Let values of u range from 0.4 to 1 and the prior variance t? be fixed and 
equal to 0.0225. 


> pm=seq(0.4,1,0.01) 
> pv=0.0025 


The prior odds, the posterior odds, and the BF can be calculated for all possible 
values of the prior mean u (pm). Note that computing the posterior Normal 
distribution with the function post_ distr, using several possible values for the 
prior mean u, returns an output vector of length n = 61 whose first n — 1 = 
60 elements represent the posterior mean, while the last element represents the 
posterior variance. 


tho=0.5 

pil=pnorm(th0,pm,sqrt (pv) ,lower.tail=F) 
prior_odds=pil/(1-pil) 

x=C(0.4866,0.5078) 

s2=0.023^2 

postm= 

post _distr (s2, length (x) ,mean (x) ,pm, pv) [1: length (pm) ] 
postv=post_distr (s2, length (x) ,mean (x) , pm, pv) 
[length (pm) +1] 

alphai=pnorm(th0,postm, sqrt (postv) ,lower.tail=F) 
post_odds=alphal/ (1-alpha1) 
BF=post_odds/prior_odds 


vvwv+tvtvVvvVvVvVv 


Figure 2.8 shows the prior probability 71 of proposition H4, the posterior probability 
a, and the BF in favor of proposition Hı for values of the prior mean u ranging 
from 0.4 to 1. 


> plot (pm, BF, type='1',ylim=c(0,max(pil,alphal,BF)), 
+ xlim=range (pm) ,xlab=expression (paste (mu)),ylab='') 
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> lines (pm, pil, 1ty=4) 

> lines (pm, alphal, 1lty=2) 

> leg=expression (paste('BF'),paste(pi) [1],paste (alpha) 
+ [1]) 

> legend(0.85,1.92,leg,1ty=c(1,4,2)) 


Note that the BF favors proposition H; (i.e., a BF greater than 1) over H2 only for 
values of u smaller than 0.47. Most importantly, one can observe the impact of the 
prior assessments (i.e., different choices of the prior mean jz) on the value of the 
BF. The higher the prior probability of proposition H1, the lower is the value of 
the measurements x = (0.4866, 0.5078) in terms of the BF in favor of Hı over H2 
Note, however, that the BF in the latter case represents strong support for Hz over 
Ai. 
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Fig. 2.8 Sensitivity analysis of the prior probability zr, (dot-dashed line), posterior probability 
a, (dashed line), and BF (solid line) for values of u ranging from 0.4 to 1 and t? = 0.0225 
(Example 2.9). Note that for a BF of 1 (dotted line), the lines of the prior and posterior probabilities 
intersect 
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2.3.2 Continuous Measurements Affected by Errors 


As noted in Sect. 2.2.2, a measurement process or observations may be affected by 
background noise. Consider a case in which it is of interest to assess the height of an 
individual based on video recordings made by a surveillance camera during a bank 
robbery. Propositions of interest may be as follows: 


Hı: The height of the individual is less than 180 cm. 
Hy: The height of the individual is equal to or greater than 180 cm. 


Assume that the height measurements x of an individual are normally distributed, 
X ~ N(@, 07), where 8 represents the true height of the individual and o° represents 
the variance of the measurement device. Assume also that the variance ø? is inferred 
from previous ad hoc experiments. However, the measured height is, generally, 
affected by an error &, related to the circumstances under which the recording was 
made. Factors of interest here include the posture and movements of the person, 
the type of clothing (including headwear and shoes) and lighting conditions. Such 
circumstances represent a further source of variation 8, unrelated to o*. The 
measured height is therefore X ~ N(@ + £, o? + 6”). A conjugate Normal prior 
distribution N(x, t?) is taken to model prior uncertainty about 0. The values of the 
parameters & and 57 are case-specific assignments. It can be shown that the posterior 
distribution of the true height 0 is still Normal with mean 


_ Ê- E)+ ulo? +8)/n 
E T2 + (a2 + 82)/n 


(2.16) 


x 


and variance 


t? (o? + 6°)/n 
T, = k 
* t? + (02 + 82)/n 


(2.17) 


Example 2.10 (Image Analysis) Consider the hypothetical case introduced 
above and assume that, according to eyewitness testimony, the height of the 
perpetrator is approximately between 175 cm and 185 cm. This allows one to 
define a prior probability distribution for the height 6 centered at 180 cm with 
variance equal to 2.79 cm, i.e., 0 ~ N(180, 2.79). The standard deviation can 
be quantified as in (2.15): 


isup=185 
pm=180 
ps=(lsup-pm) /3 
Pv=ps~2 


Ww WA WP WE 


(continued) 
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Example 2.10 (continued) 
Thus, the two hypotheses Hı and Hp introduced above are, a priori, equally 
probable (hence, the prior odds equal 1). 


> th0=180 

> pil=pnorm(th0,pm,sqrt (pv) ) 
> prior_odds=pil/ (1-pil) 

> prior_odds 


tak a 


The available recordings depict an individual appearing in n = 10 images. 
Height measurements yield the sample mean x = 180.25. The variance of the 
measurement procedure is known and equal to ø? = 0.12. The experimental 
setting is such that the values for the parameters of the Normal distribution of 
the error can be set to £ = 0.5 and ô? = 1. 


mx=180.25 
m=O 
g2=0, 12 
si ORS 
doc 
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The posterior mean and the posterior variance of 0 can be computed as 
in (2.16) and (2.17), respectively. 


> postm= (pv (mx-xi) +pmxs (s2+d2) /n) / (pv+ (s2+d2) /n) 
> postm 


[2] 79.7597 


> postv=(pvx (s2+d2) /n) / (pv+ (s2+d2) /n) 
> postv 
Lill O.1oOv7GESs92 


The gray shaded area in Fig. 2.9 shows the posterior probability of the 
hypothesis Hı. The posterior odds and the Bayes factor can be obtained 
straightforwardly 


> alphal=pnorm(th0,postm, sqrt (postv) ) 
> post_odds=alphai/(1-alpha1) 
> post_odds 


[2] 3.31109 


> BF=post_odds/prior_ odds 
SIBE 


EI 3.511039 
(continued) 
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Fig. 2.9 Posterior nx | 
distribution f (6 | x) for the T 
true height 6 in 
Example 2.10. The gray 24 
shaded area shows the 
posterior probability of the eo 
hypothesis Hı (6 < 180 cm) o] 
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Example 2.10 (continued) 

Given that the prior odds are 1, the BF is numerically equivalent to the 
posterior odds. This value represents support for the hypothesis Hı (the 
height of the individual is lower than 180 cm) over Hp. Specifically, the BF 
indicates that it is approximately 3 times more probable to obtain such height 
measurements if the height of the individual is less than 180 cm than if the 
height is equal to or greater than 180 cm. 


2.3.3 Decision for a Mean 


The previous sections focused on how to draw a probabilistic inference about a 
Normal mean, using the Bayes factor. Recall that the competing propositions were: 


Hı: The target quantity 0 exceeds a given level 6. 
H: The target quantity 0 is equal to or smaller than a given level 0. 


A related question is how to decide about whether or not a quantity of interest 
is above a given (legal) threshold, i.e., accepting either Hı or H2. In order to 
address this question, it is necessary to introduce a loss function to take into account 
the decision maker’s preferences. Suppose a linear loss function is considered as 
in (2.18): 
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0 if 0 > 60, 0 if 0 < bo, 
L(d, 0) = Lido, 0) = (2.18) 
li (@ — 8) if 6 = bo. bt = 0o) if 8 > 0o. 


The Bayesian posterior expected loss of decision dı can be computed as 


EL(d; | x) = nf (0 — A) fO | x)dð 


<6 


t 
= It, jo f f boas] , (2.19) 
0 


where f (0 | x) is a Normal posterior distribution with parameters ux and t2(x) as 
in (2.13) and (2.14), t = Tx (09 — Ux), while #(-) denotes the probability density of 
a standardized Normal distribution (Bernardo & Smith, 2000). 

In turn, the Bayesian posterior expected loss of decision d2 can be computed as 


EL(d2 | x) = of (9 — 60) f © | x)dé 


0 >00 
= ht E -t / gas] ! (2.20) 
t 


Again, the decision criterion amounts to deciding dı (d2) whenever EL(d; | x) is 
smaller (greater) than EL(d2 | x). 


Example 2.11 (Alcohol Concentration in Blood—Continued) Recall Exam- 
ple 2.9 where the posterior distribution of the alcohol level @ was 
N(0.50072, 0.00026), and the legal threshold was equal to 0.5. 


= Ed=0.5 
> postm 


[1] 0.5007182 
> postv 
[1] 0.0002614268 


Consider a symmetric linear loss function as in (2.18) with J; = J, = 1. The 
Bayesian posterior expected losses in (2.19) and (2.20) can be obtained as 


= Jae 
= 2st 
> t=sqrt (postv) *(th0-postm) 


(continued) 
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Example 2.11 (continued) 

> eldi=llxsqrt (postv) « (dnorm(t)+t«*(pnorm(t)-0.5)) 
> eld2=12xsqrt (postv) « (dnorm(t)-t*pnorm(t, lower. 
+ tail=F) ) 

Seela erd) 


[1] 0.006450377 0.006450471 


The optimal decision thus is to consider that the alcohol level is greater than 
the legal threshold because this decision has a lower expected loss, though the 
difference between the two expected losses is, in the example here, extremely 
small 


> abs (eldi-eld2) 
[1] 9.388144e-08 


Note that this result crucially depends on the decision maker’s value assess- 
ments (i.e., the chosen loss function). 


When expected losses for rival decisions are very similar, as is the case in 
Example 2.11, a sensitivity analysis should be performed as suggested, for example, 
in legal literature (Edwards, 1988). The sensitivity analysis should evaluate the 
effect of changes in the prior parameters and the loss values. See also Sect. 2.3.1 
for a sensitivity analysis of the BF for evaluating the impact of changes in 
hyperparameters characterizing the prior distribution for the unknown level of 
alcohol in blood. 

It is also worth to reflect on the choice of the loss function. A symmetric loss 
function, as previously suggested, may not realistically reflect the decision maker’s 
preferences. For example, a decision maker who is concerned about road safety may 
consider that falsely concluding that an individual’s blood alcohol concentration 
is below the legal limit is a more serious error than falsely concluding that an 
individual’s blood alcohol concentration is above the legal threshold. Therefore, /2 
may be taken to be larger than l4, reflecting the greater inconvenience associated 
with underestimating the alcohol concentration. For example, when lı = 1 and 
l2 = 2, meaning that underestimating the alcohol level is considered twice as serious 
as overestimating it, the expected loss of decision d2 will increase. One can verify 
that for any reasonable value of l2 greater than lı, decision dı will be the one with 
the smaller expected loss. 


2.4 Summary of R Functions 


The R functions outlined below have been used in this chapter. 
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Functions Available in the Base Package 


apply: applies a function to the margins (either rows or columns) of a matrix 
acf: computes and plots estimates of the autocorrelation function 


d<name of distribution>,p<name of distribution>, 

r<name of distribution> (e.g., dbeta, pbeta, rbeta): calculates the 
density and the cumulative probability and generates random numbers for various 
parametric distributions 


rowSums: forms row sums for numeric arrays (or data frames) 


Further details can be found in the Help menu, help.start(). 


Functions Available in Other Packages 


dbbinom and pbbinom in package ext raDistr: calculates the density and the 
cumulative probability for a beta-binomial distribution 


Functions Developed in This Chapter 


dbinpois: computes the product between a binomial likelihood Bin(n, 0) at x — 
xp and a Poisson likelihood Pn(A) at x, where x represents the number of items 
counted as presenting a given target characteristic and x, represents the number of 
background elements affecting the counting process 


Usage: dbinpois (xb) 

Arguments: xb: a vector of integers ranging from 0 to x 

Output: a vector of values, where each value represents the probability of the product 
between the binomial and the Poisson likelihood at a given value of the input 
argument xb 


mh1: computes the function x/(1 + x) 
Usage: mh1 (x) 

Arguments: x: a scalar value x 

Output: the value of x/(1 + x) 


mh2: computes the function x /(1 + x)? 
Usage: mh2 (x) 

Arguments: x: a scalar value x 

Output: the value of x/(1 + x)? 
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post distr: computes the posterior distribution N(ux, t2) of a Normal mean 0, 
with X ~ N(@, o°) and 6 ~ N(y, Tt”) 

Usage: post_distr(sigma,n,barx,pm, pv) 

Arguments: Sigma, the variance o? of the observations; n, the number of observa- 
tions; barx, the sample mean x of the observations; pm, the mean u of the prior 
distribution N(u, t?) and pv, the variance t? of the prior distribution N(u, t?) 

Output: a vector of two values: the first is the posterior mean uy and the second is 
the posterior variance t2 

Published with the support of the Swiss National Science Foundation (Grant no. 

10BP12_208532/1). 
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Chapter 3 A) 
Bayes Factor for Evaluative Purposes gag 


3.1 Introduction 


Consider a case where material of known source (control material) and evidential 
material of unknown source (recovered or questioned material) are collected 
and analyzed. Interpretation of scientific evidence then amounts to assessing the 
probative value of the observations made during comparative examinations. The 
evidence is evaluated in terms of its effect on the odds in favor of a proposition Hy 
put forward by the prosecution, compared to an alternative proposition Hz advanced 
by the defense. 

During comparative examinations, observations and measurements are made, 
leading to either discrete or continuous data. Forensic laboratories may also have 
equipment and methodologies that can lead to output in the form of multivariate 
data. Thus, scientific evidence is often described by more than one variable. For 
example, glass fragments from a crime scene can be compared with fragments 
collected on the clothing of a person of interest on the basis of several chemical 
components, as well as physical characteristics. It should be noted, however, that the 
assessment of a Bayes factor for multivariate data may be challenging. For example, 
data may not present enough regularity so that standard parametric distributions 
cannot be used. Data may also present a complex dependence structure with several 
levels of variation. In addition, a feature-based approach might not be always 
feasible, and it may be necessary to derive a Bayes factor on the basis of scores. 

This chapter is structured as follows. Sections 3.2 and 3.3 address the problem 
of evaluation of evidence for various types of discrete and continuous data, 
respectively. Section 3.4 presents an extension to continuous multivariate data. 


Supplementary Information The online version contains supplementary material available at 
https://doi.org/10.1007/978-3-031-09839-0_3. The files can be accessed individually by clicking 
the DOI link in the accompanying figure caption or by scanning this link with the SN More 
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3.2 Evidence Evaluation for Discrete Data 


This section deals with measurement results in the form of counts, using the 
binomial model (Sect. 3.2.1), the multinomial model (Sect. 3.2.2), and the Poisson 
model (Sect. 3.2.3). 


3.2.1 Binomial Model 


In many practical applications, data derive from realizations of experiments that 
may take one of two mutually exclusive outcomes. Examples include general 
features (so-called class characteristics) observed on questioned and known items 
or materials (e.g., fired bullets, fibers) when the question of interest is whether the 
compared materials come from the same source. 

Consider a hypothetical case involving a questioned document for which results 
of analyses of black toner are available. On the questioned document, black bi- 
component toner is present. It is of the same type as that used by a given printing 
machine (known source). A question that may be of interest in such a case is how 
this analytical information should affect one’s belief in the proposition according 
to which the questioned document has been printed using the device of interest 
(Biedermann et al., 2009, 2011a). The competing propositions can thus be defined 
as follows: 


Hı : The questioned document has been printed with the device of interest. 
H : The questioned document has been printed with an unknown device. 


Let T denote the observed toner type, either single component (Ts) or bi- 
component (Tg). Suppose that a database of the toner type (magnetism) of samples 
of black toner from N machines is available, n of which use a bi-component toner. 
Denote by 0 the proportion of the population of printing devices equipped with bi- 
component toner. Available counts can be treated as realizations of Bernoulli trials 
(Sect. 2.2.1) with constant probability of success 6, Pr(Tg | 0) = 0. Suppose a 
conjugate beta prior distribution Be(a, 8) is used to model uncertainty about 0, 
where g and £ can be elicited using the available background knowledge as in (1.42) 
and (1.43). 

Denote by Ey the observations made on recovered material and by Ey the 
observations made on control material (i.e., documents printed with the device of 
interest). If the questioned document originates from the device of interest, the 
probability of the evidence becomes 


Pr(Ey = Tg, Ex = Tg | Hi) = f Pr(Tg | 0) -0°%'(1 — 6)8-'d6/B(a, B) 
O 


= f 8 -6%7! (1 — 6)8-'de/Bra, B). 
© 
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If the questioned document originates from an unknown device (i.e., two distinct 
devices have been used), the probability of the evidence becomes 


Pr(Ey = Tg, Ex = Tg | Ho) = Í 0? - 0%71 (1 — 0)f-!d0 /B (æ, B). 
O 


The Bayes factor can be computed as 


Ja 9 0% "(1 — 0)8 “140 


BF = 
fig 2-02-11 — 0)F-1d0 


Bia + 1, 8) 6%(1—6)8-! Bia +2, B) 


~ Bia +2, B) Jo 0%t!(1 — 6)8-! Bia + 1, B) 


reads al (3.1) 
atl ` f 


Example 3.1 (Questioned Documents) Consider the case of a printed docu- 
ment of unknown origin. Analyses reveal that the toner present on the printed 
document is of type “bi-component.” The printing device that is thought 
to have been used to print the questioned document is equipped with a bi- 
component toner. In an available database with a total of N = 100 samples of 
black toner, n = 23 are bi-component (see Table 3.1). Using this information, 
the parameters of the beta prior distribution about 0 can be elicited as follows: 


M2) 

N=100 

p=n/N 

a=px (N-1) 
b=(1-p) « (N-1) 


This leads to a Be(23, 76). 
The Bayes factor in (3.1) can be computed straightforwardly as follows: 


WWE WP V Yy 


> BF=(at+b+1)/ (atl) 
> BF 


[1] 4.206984 


The Bayes factor provides weak support for the proposition Hı according to 
which the questioned document has been printed with the printing device of 
interest rather than with an unknown printing device (H2). 


It is worth noting that there is an alternative development described in the 
forensic statistics literature that considers background information derived from a 
population database as part of the evidence, (e.g., Ommen et al., 2016; Dawid, 
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Table 3.1 Results obtained 


Á f Resin group Single component | Bi-component 
following the analysis of, Ls 1 ia 
respectively, the component  Styrene-co-acrylate | 69 
type (magnetism) and the 2. Epoxy A 8 3 
resin type of 100 samples of 3. Epoxy B 0 2 
black toner (Biedermann 4. Epoxy C 0 1 
et al., 201 1a) 5, Epoxy D 0 1 

6. Polystyrene 0 1 
7. Other 0 1 


2017). According to this line of reasoning, if proposition H; is true (numerator), 
there are (n + 1) counts of bi-component toners. That is, the questioned item and 
the known item are assumed to come from the same source, hence adding one count 
to the database. Conversely, if proposition H3 is true (denominator), there are (+2) 
counts of bi-component toner. Here, it is assumed that the questioned item and the 
known item come from different sources, hence adding two counts to the database. 
The Bayes factor can then be obtained as 


ites -0NA — ame 
fg YA — O)N—"92-1(1 — 06-10 


_a+ß+N+I1 
ae ` 


BF 


(3.2) 


One can immediately verify that this corresponds to the BF in (3.1) with parameter 
a replaced by æ + n, and parameter f replaced by 6 + N — n. However, it may be 
questioned whether the available database should be considered as evidence, rather 
than as conditioning information, because the database contains only general data 
unrelated to the case under investigation (Aitken et al., 2021). 


3.2.2 Multinomial Model 


The analyses described in Sect. 3.2.1 can be extended to situations where experi- 
ments can lead to more than two mutually exclusive outcomes. 

Consider again the case involving printed documents, introduced in Sect. 3.2.1. 
Laboratories often analyze resins of toner on printed documents by means of 
Fourier Infrared Spectroscopy (FTIR). The results can be classified into one of 
several (k) categories (Table 3.1). Suppose that the resin type (R) recovered on 
the questioned document belongs to category j, which is also found in the toner 
used by a given printing machine. The question of interest is similar to the one 
considered in Sect. 3.2.1, that is, how the available analytical information should 
affect one’s belief in the proposition according to which a questioned document has 
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been printed using a given device, called the potential source, rather than by some 
unknown printing device. 
Denote by 6; the proportion of the population that is of type (category) Rj, j = 


1,...,k, Pr(R; | 0j) = 0j. Assume that observations of distinct categories can 
be treated as independent: available counts n1, ..., ng can be treated as realizations 
from a multinomial distribution Mult(n, 01, ..., Ok) 
N! 
f@ys pte 04.08) = —— 8 tees ae 
n] ! hae ake ae nx! 
A conjugate Dirichlet prior probability distribution Dir(œ1, ... , æg) is considered 
for modeling uncertainty about the population proportions 6), ..., Ox: 


fOr... Ok lan OK) =O! ee. at! (Bia), 
with B(@) = Tiny Foo) anda =, ay. 

Denote by Ey the observations made on the recovered material and by Ey the 
observations made on the control material (i.e., documents printed with the device 
of interest). If the questioned document originates from the device of interest, the 
probability of the findings E = (Ey, Ex) becomes 


-- 0%! d0/B(a) 
= aj;-l =Í 
= f oo SORIS 0; EREET do/B(a). 


If the questioned documents originate from an unknown device (i.e., two distinct 
devices have been used), the probability of the findings E becomes 


Pr(Ey = Rj, Ex = Rj jin) = f Ce hae i he evel o 1 -< O10 /B(a). 
oe 
The Bayes factor can be computed as 
—1 aj—-l —1 
nE foj- 07 ga te 0 begets Gee do 
foro eee oi ees oldo 


1 
Bee (3.3) 
aj+l 


84 3 Bayes Factor for Evaluative Purposes 


Example 3.2 (Questioned Documents—Continued) Recall Example 3.1, 
involving questioned documents on which black toner is present. Suppose 
now that laboratory analyses focus on the toner’s resin component. Suppose 
that the parameters of the Dirichlet prior probability distribution are elicited 
as 


> €=@ (15,4, IAr Ap A pA) 


Suppose that the rather common resin group Epoxy-A (category j = 2 in 
Table 3.1) is observed on both the questioned and known documents. The 
Bayes factor in (3.3) can be computed straightforwardly as 


> J=2 
> BF=(sum(a)+1)/(a[j]+1) 
SEBE 


HI 6.2 


The Bayes factor provides, again, weak support for the proposition Hı 
according to which the questioned document has been printed with the 
printing device of interest, rather than with an unknown printing device (H2). 


Suppose that a database of the resin type of samples of black toner from 
N machines is available, nı (n2, ...) of which belong to category 1 (2, ...), 
as in Table 3.1. These data can be used to elicit the Dirichlet prior probability 
distribution. Following the methodology proposed by Zapata-Vazquez et al. (2014), 
the hyperparameters a1, ..., œg can be assessed by starting from expert judgments 
(e.g., a vector of quantiles) about proportions of items belonging to each category. 
Tools for eliciting prior probability distributions from experts’ opinions are also 
available in the R package SHELF. An example will be presented in Sect. 4.2.2. 


3.2.3 Poisson Model 


Some forensic science applications focus on the number of occurrences of particular 
events or observations that take place at given intervals of time or space. Practical 
examples are the number of gunshot residue particles (GSR) collected on the surface 
of the hands of individuals suspected to be involved in the discharge of a firearm 
(Cardinetti et al., 2006), or the number of corresponding matching striations in the 
comparative examination of marks left by firearms on fired bullets (Bunch, 2000). 

Consider the following hypothetical case. A fired bullet is found at a crime scene, 
and a person of interest is apprehended, carrying a gun. The following propositions 
are of interest: 
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H; : The bullet found at the crime scene was fired with the seized gun. 
Hy: The bullet found at the crime scene was fired with an unknown gun. 


The recovered bullet and bullets fired with the seized gun are compared. Consecutive 
matching striations (CMS) is a simple concept to quantify the extent of agreement 
between marks. The number of observed consecutively matching striations can 
be interpreted as a score. Let A(x, y) be the maximum CMS count for a given 
comparison. For the evaluation of a CMS count, data on comparisons made between 
pairs of bullets test-fired with the seized gun and between pairs of bullets test-fired 
with different guns are needed. The (score-based) Bayes factor therefore is 


_ 8G, y) | Hi) 
gA, y) | H) 


A statistical model commonly used in the forensic science literature for the type 
of data encountered in the example here assumes that counts follow a Poisson 
distribution Pn(A) 


=i } AGY) 


Toc es = > jo 
g(A(x, y) | ài) = A(x, y)! ’ A(x, y) =0,1,... ’ Ài >0, 


where parameter 1;, i = 1,2, represents the weighted average maximum CMS 
count. 

Suppose that two datasets are compiled. The first relates to pairs of bullets fired 
with the seized gun, and the second to pairs of bullets fired with different guns. 
Such data can be used to inform the probability distribution g(-) at the score value 
A(x, y) as discussed in Sect. 1.5.2 and to compute the Bayes factor as 


(A, y) | x, H1) 


sBF = > : 
g(A(x, y) | M) 


Bunch (2000) describes a likelihood ratio procedure for inference about compet- 
ing propositions. This account is based on a frequentist perspective because it uses 
the maximum likelihood estimates Ma and ho for parameters A; and A2, calculated 
under the assumption that either proposition Hj or proposition H2 is true. Using 
these two estimates in the component Poisson likelihoods leads to the following 
likelihood ratio: 

hy oat sy) 
LR = eros za 

In Bayesian statistics, the most common prior distribution for A; is the gamma 
distribution Ga(q;, 6;) with shape parameter œ and rate parameter £ (e.g. Bernardo 
and Smith, 2000): 
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Qi 
fi | œi, Bi) = aa eA, ài > 0; aj, Bi > O. 

Since the Poisson and gamma distributions are conjugate (Sect. 1.10), the posterior 
distribution of à is still in the family of gamma distributions, with parameters a 
and £ updated according to well-known updating rules (see, e.g., Lee, 2012). When 
we have a realization of a random sample from a Poisson distribution, Pn(A), say 
(z1,..-, Zn), we end up with a Ga(a’, B’), where a’ = a+) 7"_, zi and B’ = B+n. 
Note that in this case there is only one observation, A(x, y); therefore, a’ = œ + 
A(x, y) and p’ = B +1. See also Biedermann et al. (201 1b) for further illustrations 
of the Poisson—gamma model in forensic science applications. 

The marginal distribution in the numerator and denominator of the Bayes factor 
is known in closed form here. It is a Poisson—gamma distribution: 


8(AG, y)læi, Bi) = f (AM, y) f Ailai, Bidii 


i 


1 B T(@i + AG, y)) 


= . 3.4 
A(x, Y)! T (œ) (Bi + DAE an 
The score-based Bayes factor then becomes 
PET @)T (œ + AG, y))(B2 + DETAO” 
sBF = (3.5) 


BYP) @2 + AC, y))(Bi + 101440 


Another example of the use of the Poisson distribution for data in the form 
of independent counts can be found in Aitken and Gold (2013). These authors 
considered the number of occurrences of selected characteristics of speech recorded 
in a succession of time periods. In this application, a feature-based Bayes factor 
is used to assess findings with respect to the proposition according to which 
recorded and control speeches originate from the same source versus the alternative 
proposition that they originate from different sources. 


Example 3.3 (Firearm Examination) Consider a case involving a questioned 
bullet. During comparison with a reference bullet, the examiner counts four 
CMS, i.e., A(x, y) = 4. Suppose that the assumptions made in Bunch (2000) 
are suitable for the case here so that for bullets fired from the same gun 
(proposition H, holds), the weighted average maximum CMS is taken to be 
equal to 3.91. For bullets fired from different guns (proposition H holds), the 
weighted average maximum CMS count is taken to be equal to 1.32. These 
values are used in the Poisson likelihoods under H, and H3, and the likelihood 
ratio can easily obtained as 


= Se 
> lambdal=3.91 


(continued) 
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Example 3.3 (continued) 

> lambda2=1.32 

> LR=dpois(s,lambdal1) /dpois(s,lambda2) 
> IIR 


[1] 5.775487 


The evidence provides weak support in favor of the proposition according to 
which the recovered bullet passed through the barrel of the seized gun, rather 
than through the barrel of an unknown gun. 

Consider now the Bayesian perspective. Suppose that the available knowl- 
edge allows one to set the hyperparameters of the gamma distribution equal 
to {a; = 125, 6; = 32} for the numerator and to {a2 = 7, 2 = 5} for the 
denominator. This amounts to using a gamma prior distribution for 4; with 
mean equal to 3.91 and standard deviation equal to 0.35 and a gamma prior 
distribution for à2 with mean equal to 1.4 and standard deviation equal to 
0.53. The two prior distributions are shown in Fig. 3.1. 


an=125 

loM=32) 

ad=7 

loel=5 

plot (function(x) dgamma(x,an,bn),0,8, 
xlab=expression (paste (lambda) ),ylab='Probability 
density') 

plot (function(x) dgamma (x,ad,bd),0,8,add=TRUE, 
Lieyv=2) 

> leg=expression (paste('Ga(125,32)'),paste( 

# "Ga (7, 5) ")) 

> legend (4.85,1.15,leg, 1lty=c(1,2)) 


ce Wi Ge Sp Wh WA WE WP NS 


First, we write a short function poisg that computes the marginal distribu- 
tion in (3.4) 


> poisg=function(a,b,x) 
+ { (b*a) /gamma (a) gamma (a+x) /((b+1) *(atx)) } 


Next, the Bayes factor can be computed as follows: 


> BF=poisg(an,bn,s)/poisg(ad,bd,s) 
SEBE 


fal) 45. ZASOS 


Note that the introduction of a prior probability distribution reflecting uncer- 
tainty about the population parameters A; and Az has slightly lowered the 
value of the evidence. The result still represents weak evidence in favor of the 


(continued) 
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Fig. 3.1 Gamma prior for the 
Poisson parameter À under m Ea 
H; (solid line) and A s J 
(dashed line) 
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Example 3.3 (continued) 
proposition that the recovered bullet was fired with the seized gun, rather than 
with an unknown gun. 


Note that Example 3.3 involves a non-anchored approach at the numerator. The 
probability distribution of the score value is solely conditioned on the hypothesis of 
interest, that is (A(x, y) | H1). As mentioned at the beginning of this section, and 
in Sect. 1.5.2, other anchoring approaches may be considered. 


3.2.3.1 Choosing the Parameters of the Gamma Prior 


An evaluator who, initially, would like to give the same weight to all possible values 
of à may consider to use a non-informative prior distribution, that is 


-1/2, 


FQ) =A ài > Oandi = 1,2. 
The posterior probability distribution given the observations (z1,..., Zn) will be 
of type gamma with shape parameter a’ = )°"_, zi + 1/2 and rate parameter 


Bb’ = n. Note that in the type of case considered here, there is only one observation; 
therefore, a’ = A(x, y) + 1/2 and fp’ = 1. 

However, the choice of a non-informative prior distribution may be questioned. 
Take, for instance, the case example discussed earlier in this section (Example 3.3). 
It is difficult to imagine that no suitable information is available to express prior 


3.2 Evidence Evaluation for Discrete Data 89 


uncertainty about the unknown weighted average maximum count CMS, and 
hence that the same non-informative prior distribution should apply under each 
proposition. 

In Example 3.3, an informative prior distribution has been used. This raises the 
question of how to translate prior knowledge into a prior distribution. As illustrated 
in Sect. 1.10, one way to elicit prior parameters is to express prior beliefs in terms of 
a measure of location and a measure of dispersion and then equate these values with 
the prior moments of the distribution. In the case of a gamma distribution Ga(q, 6), 
this amounts to equate a value for the mean, m, with the prior mean a/f, and a value 
for the variance, s2, with the prior variance a/ 6 2 that is, 


a 2 Q 
m = — 7 Ss = 22° 
B B 
Solving for a and $ gives 
_ m? 3.6) 
=a ‘ 
m 
B= 5- (3.7) 
s 


If the shape of the prior distribution resulting from the choice of a and 6 as 
in (3.6) and (3.7) does not reflect one’s prior beliefs suitably, then one should adjust 
the numerical values of m and s. However, this may not be enough to ensure that 
the resulting prior distribution is reasonable. One should also inquire about whether 
the information that is conveyed by the prior is realistically attainable. Consider a 
random sample of size ne, providing the same amount of information as conveyed by 
the elicited prior. The sample mean should have, at least roughly, the same location 
and the same dispersion as the prior. The equivalent sample size ne can then be 
found by matching the moments of the gamma distribution to the corresponding 
moments characterizing a sample of size ne from a Poisson distributed random 
variable located at i: 


If the mean A is set equal to the prior mean a/f, the equivalent sample size ne is 
equal to £. 


90 3 Bayes Factor for Evaluative Purposes 


Example 3.4 (Elicitation of a Gamma Prior) In Example 3.3, a Ga(125, 32) 
was used for A; (the weighted average maximum CMS count under proposi- 
tion Hı), and a Ga(7, 5) for A2 (the weighted average maximum CMS count 
under proposition H2). For the prior means of 4; and Az, the values 3.91 
and 1.4 were used following Bunch (2000). For the dispersion of the two 
distributions, the values 0.35 and 0.53 have been assigned to the standard 
deviation under propositions Hı and Hp, respectively. Parameters (aj = 
125, fı = 32) and (a2 = 7, 2 = 5) have then been obtained as in (3.6) 
and (3.7). This amounts to an equivalent sample size equal to 32 for the prior 
density of A), and 5 for A2. 


3.2.3.2 Sensitivity to Prior Probabilities of Competing Propositions 


It is important to emphasize that the analyses presented here make no direct 
probabilistic statement about the truth of the propositions put forward by opposing 
parties at trial. A Bayes factor of approximately 4.25, as obtained in Example 3.3, 
only means that the evidence is approximately 4 times more probable if proposition 
H; is true than if the alternative proposition H is true. As noted earlier, this does 
not mean that proposition Hı is more probable than H2. This depends on the prior 
probabilities of the competing propositions, which can vary considerably among 
recipients of expert information, and which are beyond the area of competence of 
scientists. 

However, it may be of interest to show the impact of different prior probability 
assignments on the posterior probability of the competing propositions. To do so, 
recall that the posterior odds are given by the product of the prior odds and the 
Bayes factor 


Pr( A; | -) Pr( Hj) 

——— = BF x : 

Pr( A | -) Pr(H2) 
Using this expression, one can then investigate how the posterior probability of 
proposition H4, i.e., œ, varies for values of 7, i.e., Pr(H1), ranging from 0.01 
until 0.99, and for a Bayes factor equal to 4.25, as in Example 3.3. 


pil=seq(0.01,0.99,0.01) 
prior_odds=pil/(1-pil) 

BF=4.25 
post_odds=prior_odds«BF 
alphal=post_odds/(1+post_odds) 


vvvvyv 
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Fig. 3.2 Posterior o | 
probability a, of proposition z 
H; for values of prior 
probabilities 7; ranging from œo | 
0.01 to 0.99, and a Bayes = 
factor equal to 4.25 (solid 
line), 1 (dashed line), and 100 o | 
(dotted line) = 
3 
T 
o 
N 
o 
2 J 
o 


T 


The solid line in Fig. 3.2 shows the value of a1, the posterior probability of the 
proposition H4, as a function of the prior probability, 7x1, for BF = 4.25. The plot 
also shows results for BF = 1 (dashed line) and for BF = 100 (dotted line). 


plot (pil, alphal, type='1',xlab=expression(pi[i]), 
ylab=expression (alpha [1])) 

BF=1 

post_odds=prior odds«BF 
alphal=post_odds/(1+post_odds) 

lines (pil, alphai,1ty=2) 

BF=100 

post_odds=prior odds«BF 
alphal=post_odds/(1+post_odds) 

lines (pil, alphal, 1lty=3) 


vvvvVvVV VV + Vv 


More generally, it can be observed that the higher the value of the Bayes factor, 
the smaller the impact of the prior probabilities on posterior probabilities. 


3.3 Evidence Evaluation for Continuous Data 


The previous section considered the evaluation of scientific evidence as given 
by discrete data. However, for many types of evidence, measurements result in 
continuous data. 
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3.3.1 Normal Model with Known Variance 


In some applications, the distribution of measurements exhibits enough regularity 
to be captured by standard parametric models, such as the Normal distribution. 
One example, introduced earlier in Sect. 1.5.1, is the analysis of magnetism of 
black toner on printed documents. Due to the wide distribution and availability 
of printing machines, forensic document examiners are commonly requested to 
examine documents produced by electrophotographic printing processes that use 
dry toner. A question that forensic scientists may be asked to help with is whether 
or not two or more documents were printed with the same laser printer. This task 
involves the comparison of analytical features of a questioned document with those 
of control documents. One such analytical feature is the magnetic flux of toner. It is 
thought to be largely influenced by individual settings of the printing device, so that 
detectable differences may be expected on documents printed at different instances 
using the same or different machines (Biedermann et al., 2016a). 

Suspected page substitution is a commonly encountered problem in forensic 
document examination. Imagine a case involving a contract consisting of three 
pages where the allegation is that the second page has been substituted. It may be of 
interest, thus, to investigate the extent to which available measurements of magnetic 
flux can be informative in this case. 

Consider the following pair of propositions: 


Hı : Page two has been printed by the device used for printing pages one and three 
(i.e., the three pages have been printed with the same device). 
Hy: Page two has been printed by a different device. 


Denote by y = ()1,.--, Yn) the measurements of magnetic flux obtained for 
the questioned page. Measurements are assumed to be normally distributed with 
unknown mean 0 and known variance o”. The likelihood of the normal random 
sample (y1, ..., Yn) can therefore be expressed as 


n 


1 
f(y 10) =| [270 exp |--o - a} (3.8) 


i=1 


It can be shown, (e.g., Bolstad and Curran, 2017), that the likelihood of a normal 


random sample is proportional to the likelihood of the sample mean y = 1 Di1 Yi 


The sample mean is normally distributed with mean 0 and variance o? /n 


1 
202/n 


fO | 0) = Q207/n)~' exp {- G- or (3.9) 


In other words, it is possible to reduce the problem to one where a single normal 
observation y is available. 

Next, denote the measurements on uncontested pages by {x} = (xy, j = 
1,...,n and / = 1,2), where the subscript l refers to the page number and j to 
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the number of measurements of magnetic flux obtained for the page /. A normal 
distribution with mean @ and variance ø? is assumed for x, analogously to what 
has been assumed for y. A conjugate normal prior distribution is chosen for 0, say 
6~ N(p, t?). The Bayes factor can be computed as in (1.16): 


BF = fO | AA) 
fO | M) 


_ SFG OFE | x1, x2, Hdd 
SFO lOO | Hd 


(3.10) 


where f (0 | x1, X2, Hı) is the posterior distribution of 0, obtained by updating the 
prior distribution N(u, t?) using the measurements x; and x2. This is a normal 
distribution, (9 | x1,x2) ~ N(x, tÊ), with posterior mean uy and posterior 
variance Th computed according to the updating rules (2.13) and (2.14). Using 
the result (1.21), one can easily verify that the density in the numerator is still a 
normal distribution with mean equal to the posterior mean yy and variance equal to 
the sum of the posterior variance T and the population variance ø? divided by the 
sample size n, i.e., = + o7/n. In the same way, invoking (1.22), the density in the 
denominator is still a normal distribution with mean equal to the prior mean u and 
variance equal to the sum of the prior variance t? and the population variance o? 
divided by the sample size n, i.e., T? +.07/n. 


Example 3.5 (Printed Documents) Consider the case described above where 
a forensic document examiner measures the magnetic flux on two uncontested 
pages 1 and 3 (Biedermann et al., 2016a). The results are x} = (16, 15, 15) 
and x2 = (16, 15, 16). The measurements for the contested page 2 are y = 
(15, 16, 16). Previous experiments allow one to assign the value 0.24 for the 
population standard deviation o. Based on the available knowledge regarding 
the magnetic flux of toner on printed documents, the prior mean jz and the 
prior variance t° for the unknown quantity of magnetic flux are set equal 
to 17.5 and 3.927, respectively. This means that values of the magnetic flux 
smaller than 6 and greater than 29 are considered, a priori, to be extremely 
unlikely. 


maS 

tav2=3, 9272 
sigma2=0.24^2 

Z CL S ES 16, 15, 16) 
y=c (15,16,16) 
nx=length (x) 
ny=length (y) 


MOM ONON NON N 


(continued) 


94 3 Bayes Factor for Evaluative Purposes 


Example 3.5 (continued) 

The posterior distribution f(@ | x1, x2) can be obtained by a single applica- 
tion of Bayes theorem with the full set of available measurements (x), x2). 
The posterior parameters jz, and T can be calculated using the function 
post_distr introduced in Sect. 2.3.1. 


> mupost=post_distr(sigma2,nx,mean(x),mu,tau2) [1] 
> mupost 


fal] 5. 50125 


> tau2post=post_distr(sigma2,nx,mean(x),mu,tau2) [2] 
> tau2post 


[1] 0.009594006 


The two marginal densities in the numerator and denominator of the BF 
in (3.10) can be calculated at the sample mean y. The exact value of the Bayes 
factor is given by 


> BF=dnorm (mean (y) , mupost, sqrt (tau2post+sigma2/ny) ) / 
+ dnorm (mean (y) , mu, sqrt (tau2+sigma2/ny) ) 
> BF 


MI 16, 05199 


This value represents moderate support for the proposition of page substitu- 
tion, compared to the proposition of no page manipulation. 


3.3.2 Normal Model with Both Parameters Unknown 


So far, the variance of the distribution of the observations has been assumed to 
be known, though in many practical situations the mean and the variance are both 
unknown, and it is necessary to choose a prior distribution for the parameter vector 


(6,07). The Bayes factor can be computed as in (1.16): 
p- JY1*% AD 
f(y | A2) 


_ LY 18,07) £0, 07 | x, HdE, 07) 
S FY 8,07) £0, 07 | HDA, 07) ` 


(3.11) 


Consider the case where a conjugate prior distribution for (0, o?) of the form 
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f6, a7) = f0 | o*) fe”) (3.12) 


is chosen. In this distribution, prior beliefs about the population mean 0 are 
calibrated by the scale of measurements of the observations.! The conditional 
distribution f(@ | ø?) is taken to be normal, centered at u with variance o7/no, 


(0 | o’) ~ N(u, g), The parameter no can be thought of as the prior sample 
size for the distribution of 0. As pointed out in Sect. 2.3.1, it formalizes the size 
of the sample from a normal population that provides an equivalent amount of 
information about 6. The distribution f (a?) is taken to be an S times inverse chi- 
squared distribution with k degrees of freedom, o? ~ S-x~?(k). It can be shown that 
this is equivalent to an inverse gamma distribution with shape parameter a = k/2 
and scale parameter B = S/2,07 ~ IG(a = k/2, B = S/2). Alternatively, prior 
uncertainty about dispersion can be formulated in terms of the precision à? = 1/07. 
The prior distribution of à? becomes a gamma distribution with shape parameter 
a = k/2 and rate parameter B = S/2, A? ~ Ga(a = k/2, B = S/2). For further 
discussion, see e.g. Bernardo and Smith (2000), Bolstad and Curran (2017) and 
Robert (2001). 

Consider now the posterior distribution of the unknown parameter vector (0, à?) 
once a vector of observations x = (x1,..., Xn) becomes available. It takes the form 
of a normal—gamma distribution 


£0, A? | x, H1) = NG(un, n', On, Bn), 


with 


nx + nop j 
Un = ; n =n+no 
n+no 
n 
ole 
pups Joop cam 
~~ 2 ý 5 no+n 


' Note that in (3.12) population parameters are not, a priori, independent. Whenever this condition 
is felt to be too restrictive (see, e.g., Robert (2001)), it is also possible to choose a prior distribution 
as the product of independent priors, f (0, o?) = SOF (o”). In this case, the derivation of the 
posterior distribution can be more demanding. 
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and s? = H Y s(x; — x)’. 

If uncertainty about the two unknown parameters is modeled by means of the 
conjugate prior distribution in (3.12), the integrations in (3.11) have an analytical 
solution and the BF can be obtained straightforwardly. 

Denote by y = (y1,---, Yn,) a vector of measurements made on questioned 
material and consider the sample mean y = D ı yi. It can be proved that the 
marginal density f (y | x, Hı) in the numerator is a Student t distribution with 2œ +n 
degrees of freedom, centered at un, with spread parameter, denoted sn, equal to 


B ny(n + no) ( ") 1 
~ n+notny eG Pn > 


This can be denoted as fi (y | Un, Sn, 2æ +7). 

The marginal density f(y | H2) in the denominator is a Student t distribution 
with k degrees of freedom, centered at u with spread parameter (precision), denoted 
Sq, equal to 

_ Agħy “4 
a no +y ap 
(Bernardo and Smith, 2000). This can be denoted as f2(y | H, Sa, 20). 
The Bayes factor can then be computed as 


pp a LO | Pn: Sn, 20 +n) 
RO | h, Sa, 20) 


(3.13) 


Choosing the Parameters of the Normal Prior 


The use of a conjugate prior distribution for the mean and the variance of a 
normal distribution raises the question of how to choose the hyperparameters, as the 
resulting distribution should suitably reflect available prior knowledge. The prior 
distribution f (8 | 07) requires one to choose a value for u, the measure of location, 
and a value for nọ. The ratio no/n characterizes the relative precision of the prior 
distribution compared to the precision of the observations. If this ratio is very small, 
the less informative will be the prior distribution, and the closest will be the posterior 
distribution to that obtained using a non-informative prior distribution. In fact, 
when no/n approaches zero, the limiting form of the marginal distribution of the 
population mean 6 is N(x, 07/n), which corresponds to the posterior distribution 
that would be obtained using a non-informative prior distribution (Robert, 2001). 
For more specific prior beliefs (i.e., concentrated on a limited range of values), a 
higher value of no should be chosen. 

Regarding the prior distribution of 07, consider a number of degrees of freedom 
k = 20 so that the prior mass is distributed rather symmetrically. Suppose also 
that, based on knowledge available from previous experiments, it is considered 
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that values of ø? greater or smaller than 0.05 are equally plausible, so Pr(o? > 
0.05) = 0.5. The parameter S can be elicited by recalling that o7/S ~ x~(k) and, 
analogously, S - A? ~ x7(k) so 


Pr (0? bs 0.05) = Pr (s 2 <S. 20) = 0.5, 


where S - 20 is the quantile of order 0.5 of a x? distributed random variable with 
k = 20 degrees of freedom. 


> sigma2=0.05 
> k=20 

> p=0.5 

> q=qchisq(p,k) 
> q 


[1] 19.33743 


> S=q*xsigma2 


Parameter S is then equal to 


S = 19.3374 x 0.05 © 1. 


The elicited prior distribution for ø? is IG, ) and is shown in Fig. 3.3. 


Fig. 3.3 Inverse Gamma 
prior distribution IG, 5) 


25 


for o? in Example 3.6 
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Example 3.6 (Printed Documents—Continued) Consider again Example 3.5 
where magnetic flux was measured on uncontested and questioned pages. 


The population variance o 


2 was assumed known and equal to 0.0576. 


Suppose now that a new measuring device is used and that the number 
of previous experiments (i.e., measurements) conducted with this device is 
limited. A conjugate prior distribution as in (3.12) is introduced to model 
prior uncertainty about 6 and o?. 


The prior distribution for 6 | ø? can be centered at u = 17.5 as in 


Example 3.5 with no = 0.004 reflecting a very weak prior belief with respect 
to the precision of the observations, 9 ~ N(17.5, 07/0.004). 


> 
> 


gia, 5 
n0=0.004 


The prior distribution about o? has been elicited above, with k = 20 degrees 
of freedom, and S = 1, o? ~ IG(Ẹ, 4), shown in Fig. 3.3. 


+ V VV Vv 


library (extraDistr) 

S= 

k=20 

plot (function(x) dinvgamma(x,k/2,S/2),0,0.2, 
xlab=expression (paste (sigma) ~“2),ylab='') 


Note that the function dinvgamma is available in the package ext raDistr 
(Wolodzko, 2020). Measurements are the same as in Example 3.5. 


vvvyv 


Me@ (16,15, 15,16,i15,, 16) 
y=c(15,16,16) 
n=length (x) 

ny=length (y) 


Let us first consider the marginal density in the numerator of the Bayes factor 
in (3.13). It is a Student t distribution with 2a + n = k +n = 26 degrees of 
freedom, centered at un = 15.5 with spread parameter Sn = 20.6724. 


> 
> 


mun= (n*mean (x) +n0*mu) / (n+n0) 
mun 


fi] 15. 50183 


VV W y 


s2=sum( (x-mean (x) ) ^2) 

bn=S/2+ (s2+n0«n« (mean (x) -mu) ^2» (n0+n) ^(-1))/2 
sn=ny» (n+n0) / (n+n0+ny) + (k+n) /2*bn~* (-1) 

sn 


al] 20,6724 


(continued) 
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Example 3.6 (continued) 

The marginal density at the denominator of the Bayes factor in (3.13) is a 
Student t distribution with 2a = k = 20 degrees of freedom, centered at 
u = 17.5 with spread parameter sg = 0.0799. 


> sd=ny*n0/ (n0+ny) *k/S 
> sd 


[1] 0.07989348 


The density of a non-central Student t distributed random variable can be cal- 
culated using the function dstp, available in the package LaplacesDemon 
(Hall et al., 2020). The Bayes factor can be obtained as 


> library (LaplacesDemon) 
> BF=dstp (mean (y) , mun, sn, k+n) /dstp (mean (y) , mu, sd, k) 
SEBE 


[fl] 13. 88188 


The Bayes factor represents moderate support for the proposition according 
to which page two has been printed by the same device as the one used for 
printing pages one and three, compared to the proposition according to which 
page two has been printed by a different device. 


It is worth emphasizing that the BF is highly sensitive to the choice of the prior 
(see Sect. 1.11). A sensitivity analysis should therefore be conducted. 


3.3.3 Normal Model for Inference of Source 


Consider again a case as described in Sect. 3.3.1, involving the analysis of toner on 
printed documents. Magnetic flux was considered as a feature of interest because it 
is largely influenced by the settings of the printing device. Suppose now that more 
than one potential source (i.e., printing device) is available for examination. The 
issue of interest is which of two machines has been used to print a questioned 
document (e.g., a contested contract). The propositions of interest can be defined 
as follows: 


H; : The questioned document has been printed with machine A. 
Hy: The questioned document has been printed with machine B. 


The two potential sources, i.e., machines A and B, are used to print documents 
under controlled conditions. The measurements made on documents printed by 
the two devices are denoted {xp} = (Spi, p = A,B andi = 1,...,m), with 
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Xpi = (Xpil,---+»Xpin) denoting the vector of n measurements for each analyzed 
page, i = 1,...,m, from each printer p = A, B. Measurements are assumed to 
be normally distributed with unknown mean 0p, p = A, B, and variance o2. The 
variance is assumed to be known and equal for the two devices. A conjugate normal 
prior distribution is taken for the unknown mean 6p, say 0p ~ N(Up, T2), p= A,B. 


Measurements on the questioned document are denoted by y = (y1, ..., Yq), 
with y; = (yj1,---, jn) denoting the vector of n measurements from each 
contested page j = 1,...,q. For cases in which q > 1, it is assumed that all 


pages have been printed with a single device. The distribution of measurements 
on the questioned document is also taken to be normal. The sample mean y = 
a ca Jz; jk has a normal distribution with mean 6, and variance o*/nq, 
(Y | @p,07/ng) ~ N@p, o7/nq). 


The Bayes factor can be computed as 


_ ff | Oa f Ga | xa)dOa 
~ S fC | On) f Os | xe)dOp 
_ fl xa, Hi) 
~ fO | Xp, Hy) 


BF 


(3.14) 


The marginal probability density in the numerator can be obtained in closed 
form. It is a normal distribution with mean equal to the posterior mean w4,, and 
variance equal to the sum of the posterior variance ts and population variance 
ox /nq (where nq is the total number of observations), that is, f(y | x4, Hı) = 
N(iA,x, vA ME a? /nq). In the same way, one can obtain the marginal probability 
density in the denominator, f(y | xg, H2) = N(uB,x, ory + o7/nq). As observed 
in Sect. 3.3.1, the numerator and the denominator of (3.14) can be calculated as the 
densities of two normally distributed random variables, N(uA.x, Tia + o? /nq) 
and N(uB,x, Thx + o? /nq), at the sample mean y of the measurements on the 
questioned document. 


Example 3.7 (Printed Documents) Consider a type of case and propositions 
as introduced above, and suppose that there is only one contested page, that is, 
q = 1. Measurements of the magnetic flux lead to the following results: y = 
(20, 20, 21) (i.e., n = 3 measurements are taken). Two pages are printed with 
each printing device. The results are as follows (Biedermann et al., 2016a): 


Printer A Printer B 
Page 1 20 20 19 Pl AQ) PI 
Page2 202120 21 22 21 


(continued) 
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Example 3.7 (continued) 
The available data thus are 


ARE (20, 20, 19,20, 21, 2O) 
so=6 (Zi, BO, Bi, Bil, 22, Bi) 
y=c(20,20,21) 

n=length (y) 


vvvyv 


The population standard deviation o is taken to be equal to 0.24, as in 
Example 3.5. We also choose the same prior distribution as used in Example 
3.5 to describe uncertainty about the magnetic flux of toner printed by the two 
printing devices. Thus, ya = ug = 17.5 and T = o = 3.922, 


sigma2=0.24^2 
na=length (xa) 
nb=length (xb) 
vii. 5 
EW D=3 , 9272 


vvvv N 


The posterior distributions f(04 | x4) and f(@g | xg) can be obtained 
by a single application of Bayes theorem using the full set of available 
measurements for each printer. The posterior parameters 4x, B.x, a = 


and Ge , can be calculated using the function post_distr: 


muapost=post_distr(sigma2,na,mean(xa),mu,tau2) [1] 
tauapost=post_distr(sigma2,na,mean(xa),mu,tau2) [2] 
mubpost=post_distr(sigma2,nb,mean(xb) ,mu, tau2) [1] 
taubpost=post_distr(sigma2,nb,mean (xb) , mu, tau2) [2] 


vvvyv 


The two marginal densities in the numerator and denominator of the BF 
in (3.14) can be calculated at the observed value y. The BF can thus be 
computed as the ratio of two marginal densities: 


> BF=dnorm(mean(y),muapost, sqrt (sigma2/n+tauapost) )/ 
+ dnorm (mean (y) ,mubpost, sqrt (sigma2/n+taubpost) ) 
SRBE 


[1] 304.7886 


This value represents moderately strong support for the proposition according 
to which the questioned page been printed using device A, rather than using 
device B. 


Consider a “O—/;” loss function as in Table 1.4. The optimal decision is to accept 
the view according to which the questioned page was printed by the device A (as 
stated by proposition 1), rather than by device B, whenever 
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L/h 


BF > : 
7/12 


If the odds are evens, and a symmetric loss function is felt to be appropriate, the 
Bayes decision is to accept the view according to which the questioned document 
has been printed with machine A (B) whenever the BF is greater (smaller) than 1. 

When available information is limited, one may choose a non-informative prior 
distribution for (6, 2) that can be specified as 


1 
f@,0°) = T (3.15) 
In this case, the marginal distribution in the numerator of the BF is proportional to a 
Student t distribution with n4 — 1 degrees of freedom, centered at the sample mean 
Xa with spread parameter s, equal to 


nang 
Sn = ee 
2 
(na +1q)s4 
where s4 = zH fas 1&4 — Kay’, na is the total number of observations from 
device A, and nq is the total number of measurements from the q contested 
pages (i.e., n measurements for each contested page). This can be denoted as 
Ag | XA, Sn, NA — 1). 
Vice versa, the marginal distribution in the denominator of the BF is proportional 
to a Student t distribution with ng — 1 degrees of freedom, centered at the sample 
mean Xg with spread parameter sq equal to 


— ng 
(ng +nq)s3’ 


where sg = a i y 1&8- X B)? and ng is the total number of observations from 
device B. This can be denoted as f2(y | XB, Sa, npg — 1). 
The Bayes factor can then be obtained as 


— fiO | XA, 5n,na — 1) 


BF= — ; 
hO | XB, Sd, nB — 1) 


(3.16) 


Example 3.8 (Printed Documents—Continued) In Example 3.7, a normal 
prior distribution has been used for (6, o”). Consider now a non-informative 
prior distribution as in (3.15). In order to compute the Bayes factor, one must 
first obtain the spread parameters sn and sg under the competing propositions. 


(continued) 
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Example 3.8 (continued) 

> s2a=var (xa) 

> sn=naxn/( (natn) *s2a) 
> s2b=var (xb) 

> sd=nbxn/((nb+n) *s2b) 


Note that in this case the number of contested pages q is set equal to 1. The 
density of a non-central Student t distributed random variable can be obtained 
using the function dstp available in the package LaplacesDemon (Hall 
et al., 2020). The Bayes factor can be obtained as follows: 


library (LaplacesDemon) 

BF=dstp (mean (y),mean(xa),sn,na-1)/ 
dstp (mean (y) ,mean (xb) , sd, nb-1) 

BF 


Wea WW) 


[il 2,197 


The Bayes factor represents weak support for the proposition according to 
which the questioned document has been printed with machine A, rather than 
with machine B. 


More Than Two Propositions 


Consider now the case where more than two devices are available. As in Sect. 1.6, 
the question is how to evaluate measurements made on questioned and known items 
(i.e., documents), as the BF involves pairwise comparisons. A scaled version of the 
marginal likelihood may be reported as in (1.27). 


Example 3.9 (Printed Documents, More Than Two Propositions) Recall 
Example 3.7, and assume that a third printer, machine C, is available for 
comparative examinations. The propositions of interest are therefore: 


Hı : The questioned document has been printed with machine A. 
H : The questioned document has been printed with machine B. 
H3 : The questioned document has been printed with machine C. 


Two pages are printed with the additional printing device C. All results, 
including those from machines A and B, are as follows: 


(continued) 
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Example 3.9 (continued) 


Printer A Printer B Printer C 
Pagel 202019 212021 21 20 21 
Page2 202120 212221 20 21 20 


Let the prior distribution describing uncertainty about the magnetic flux 
characterizing machine C be the same as introduced previously, that is uc = 
17.5 and Te = 3.92. First, the posterior distribution f (6c | xc) is calculated: 


ZE CZI 20, Bil, 20, Zl, 2O)) 

nc=length (xc) 
mucpost=post_distr(sigma2,nc,mean(xc) ,mu, tau2) [1] 
taucpost=post_distr(sigma2,nc,mean(xc),mu,tau2) [2] 


vvvyv 


Next, consider the marginal likelihoods of the sample mean that can be 
obtained as 


> mla=dnorm (mean (y),muapost, sqrt (sigma2/n+tauapost) ) 
> mlb=dnorm (mean (y),mubpost, sqrt (sigma2/n+taubpost) ) 
> mlc=dnorm (mean (y),mucpost, sqrt (sigma2/n+taucpost) ) 


The scaled version of the marginal likelihoods then is 


smla=mla/ (mla+mlb+ml1c) 
smlb=m1b/ (mla+mlb+m1c) 
smlc=mlc/ (mla+mlb+m1c) 
round (c(smla,smlb,smlic),5) 


WWE WP Wy 


[21 0.16593 0-000651 0.61346 


Recall from Sect. 1.6 that this is equivalent to reporting the posterior prob- 
ability of competing propositions with equal prior probabilities. Therefore, 
if Pr(Hı) = Pr(M2) = Pr(H3) = Z, then proposition H3 has received the 
greatest evidential support. 


Alternatively, the analyst may also consider the possibility of aggregating 
propositions Hı and H3 and consider: 


Hı : The questioned document has been printed with machine C. 
Hı : The questioned document has been printed with machine A or B. 
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Example 3.10 (Printed Documents, More Than Two Propositions— 
Continued) When considering a single proposition Hı compared to a 
composite proposition H; as defined above, the Bayes factor can be obtained 
as in (1.28), with Pr(H1) = 1/3 and Pr(A)) = 2/3. 


= jo=i/Z 
> mlcx*(1-p)/(mlaxp+mlbxp) 


fl 8. VATS) 


3.3.4 Score-Based Bayes Factor 


As mentioned previously in Sect.1.5.2, it may not be possible to specify a 
probability model for some types of forensic evidence and data. An example was 
given in Sect. 3.2.3 for discrete data regarding consecutive matching striations, used 
to quantify the extent of agreement between marks on bullets. 

Consider now a case where a saliva trace is collected at the crime scene. The 
salivary microbiome is analyzed as well as that of traces originating from a known 
source, Mr. X, with the aim of discriminating between the following competing 
propositions: 


H; : The saliva trace comes from Mr. X. 
H>: The saliva trace comes from the twin brother of Mr. X. 


Note that the proposition Hz represents an extreme case of relatedness. To 
investigate this type of case, consider the data collected by Scherz (2021). This 
longitudinal study involving 30 monozygotic twins has shown the potential of 
salivary microbiome profiles to discriminate between closely related individuals 
(Scherz et al., 2021). This may represent an alternative method when standard DNA 
profiling analyses yield no useful results. 

In the study by Scherz (2021), four salivary samples have been collected from 
each participant. The first at the beginning of the study, and the others after 1, 12, 
and 13 months. Given the complex composition of microbiota, a distance can be 
calculated to compare microbiota profiles. One possibility is the Jaccard distance, 
obtained by dividing the number of amplicon sequence variants (AVSs) shared by 
the two samples by the number of distinct AVSs in the two compared samples. 
This measure has shown good discriminatory power. Other distances (e.g., Jensen— 
Shannon) can be calculated (Scherz, 2021). 

The intra-individual variability was studied by comparing all four samples of 
each individual. The intra-pair variability was evaluated by comparing pairs of 
samples from related individuals (here: homozygous twins). The inter-individual 
variability was studied by comparing samples of unrelated individuals (Fig. 3.4). 
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Fig. 3.4 Jaccard distances 2 | 
for salivary microbiota | 
compositions of pairs of co i 
samples from individual S] ' 
persons (intra-individual), 7 
pairs of related persons So 
(intra-pair), and pairs of £ h 
unrelated persons (unrelated) z o i 
[Source of data: (Scherz zg i 
et al., 2021)] S wy | 
= i=) 
x 
Oo 
2 | 
i=) 
T T T 
Intra-individual Intra-pair Unrelated 


Let 5(y, x) denote the distance between the analytical features of questioned 
material (i.e., a saliva trace of unknown origin) and control material (i.e., a saliva 
sample from Mr. X). A score-based Bayes factor (SBF) can be defined as follows: 


p 860%. y) | HD 


= owe, 3.17 
g(5(x, y) | H2) aa 


To obtain a value for this sBF, it is necessary to study the probability distribution 
of the calculated score under the competing propositions. However, the limited 
number of samples per individual, available for pairwise comparison, might make it 
difficult to assess the numerator, which is specific for a given person of interest. To 
address this problem, Davis et al. (2012) propose the use of a database of simulated 
samples to help with the construction of probability distributions for scores. 

In the example studied here, a maximum number of 6 intra-volunteer com- 
parisons are available for each participant. A viable alternative is to perform 
a so-called common-source comparison,” and use the limited number of items 
from all participants, provided that one is willing to assume a generic probability 
distribution for all individuals in the numerator. In the same way, a generic 
probability distribution is used at the denominator in all cases where a twin is 
assumed as the alternative source of the salivary trace (Bozza et al., 2022). 

Denote by {Z}, i = 1,...,mı, j = 1,...,m1} the intra-individual distances 
and by {Z}, i=1,...,m2, j =1,..., n2} the intra-pair distances, where mı (m2) 
are the number of distinct individuals (couples of twin brothers) and n; (n2) are the 
number of distances calculated for each individual (couple). A normal distribution is 
used for both the numerator and denominator to model the within-source variation 


? See Sect. 1.5.2 on the difference between specific-source and common-source propositions. 


3.3 Evidence Evaluation for Continuous Data 107 


(i.e., the variation between distances characterizing materials originating from the 
same individual and from the same couple of twins, respectively), Z if ~N@p, oF ; 
where p = {1, 2}. Different distributions can be used to describe the between-source 
variation (i.e., the variation between distances characterizing materials originating 
from different individuals and from different couples of twins, respectively). Here, a 
normal distribution is retained, 0p ~ N(up, T2). The mean vector between sources 


Up, the within-source variance o$, and the between-source variance T can be 
estimated from the background data: 
Mp Np 
A = p 
Lp =Zp = Ze (3.18) 


1J 
Mpn 
P P j=1 j=l 


Mp Np 


ô; = > DYL e-a (3.19) 


r i=l j=1 
mp a2 
a2 2 7p 
î = EE -zp , (3.20) 
— 1 i np 
= 


= n 
where Zz) = jai Zij 


Example 3.11 (Saliva Traces) Consider a case where a saliva trace is 
recovered at a crime scene and a sample is taken from a person of interest 
for comparative purposes. The Jaccard distance between the microbiota 
composition of recovered and control sample is equal to 0.51. 


= G0. Sil 


The propositions are H4, the compared items come from the same source, 
and H2, the compared items come from different sources (twins). Suppose 
that the estimated means between sources in (3.18) are 0.454 and 0.769; 
the estimated within-source variances in (3.19) are 0.0057 and 0.00067; the 
estimated between-source variances in (3.20) are 0.0028 and 0.0024 (Source 
of data: Scherz (2021)). 


mul=0.454 
mu2=0.769 
sigmal=0.0057 
sigma2=0.00067 
taul=0.0028 
tau2=0.0024 


MOM EWE WNW 


The Bayes factor can then be obtained straightforwardly as in (3.17) 


(continued) 
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Example 3.11 (continued) 

> BF=dnorm(d,mul, sqrt (taul+sigmal))/ 
+ dnorm(d,mu2, sqrt (tau2+sigma2) ) 

> BF 


HI 27766 .33 


The Bayes factor provides very strong support for the proposition that the 
saliva traces originate from the same individual rather than from two different 
individuals (twins). 


Note that a higher value of the BF is expected whenever the alternative 
proposition H> involves unrelated individuals. The inspection of Fig. 3.4 highlights 
that higher distances are recorded in this type of case. 

The between-source variability can also be modeled by a kernel density distri- 
bution, as presented in Bozza et al. (2022). See also Sect. 3.4.1.2, where a detailed 
description of the kernel density approach is given for two-level multivariate data. 


3.4 Multivariate Data 


Forensic scientists encounter multivariate data in contexts where the examined 
objects and materials can be described by several variables. Examples are glass 
fragments that are searched and recovered on the clothing of a person of interest 
and on a crime scene, or seized materials supposed to contain illicit substances. Such 
materials may be analyzed and compared on the basis of their chemical compounds 
as well as their physical characteristics. Multivariate data also arise in other forensic 
science disciplines, such as handwriting examination. Handwritten characters can, 
in fact, be described by means of several variables, such as the width, the height, 
the surface, the orientation of the strokes, or by Fourier descriptors (Marquis et al., 
2005). In addition, an emerging topic that forensic document examiners nowadays 
encounter is handwriting (e.g., signatures) on digital tablets. Such electronic devices 
provide several static (e.g., length of a signature) and dynamic features (e.g., 
speed) that can be used as variables to describe signatures (Linden et al., 2018). 
These developments have led to substantial databases that often present a complex 
dependence structure, a large number of variables, and multiple sources of variation. 
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3.4.1 Two-Level Models 


Denote by p the number of characteristics (variables) observed on items of a 
particular evidential type. Suppose that continuous measurements of these variables 
are available on a random sample of m sources with n items from each source. For 
handwriting evidence, a source is a single writer, with n characters from each writer 
and p observed characteristics that pertain to the shape of handwritten characters. 
For glass evidence, a source is a window, with n replicate measurements from a 
glass fragment originating from each window and p observed characteristics given 
by concentrations in elemental composition. The background data can be denoted 
by Zij = (Zij1,.-., Zijp), where i = 1,..., m denotes the number of sources (e.g., 
windows), j = 1, ..., denotes the number of items for each source (e.g., replicate 
measurements from a glass fragment), and p is the number of variables. 

This data structure suggests a two-level hierarchy, accounting for two sources of 
variation: the variation between replicate measurements within the same source (the 
so-called within-source variation) and the variation between sources (the so-called 
between-source variation). 


3.4.1.1 Normal Distribution for the Between-Source Variability 


In some applications, data exhibit regularity that can reasonably be described 
using standard probabilistic models. For example, the within-source variability 
and the between-source variability may be modeled by a normal distribution. 
A Bayesian statistical model for the evaluation of trace evidence for two-level 
normally distributed multivariate data was proposed by Aitken and Lucy (2004) in 
the context of evaluating the elemental composition of glass fragments. To illustrate 
this model, denote the mean vector within source i by 0;. Denote by W the matrix 
of within-source variances and covariances. The distribution of Z;; for the within- 
source variation is taken to be normal, Z;; ~ N(@;, W). For the between-source 
variation, the mean vector between sources is denoted by mw, and the matrix of 
between-source variances and covariances by B. The distribution of the 0; is taken 
to be normal, 0; ~ N(u, B). 

Measurements are available on items from an unknown source (recovered 
material) as well as measurements on items from a known source (control material). 
The examined items may or may not come from the same source. Competing 
propositions may be formulated as follows: 


Hı : The recovered and the control item originate from the same source. 
Hy: The recovered and the control item originate from different sources. 


Denote the measurements on recovered and control items by, respectively, 


y= (jı, TEAT, and x = (Xj,...,Xn,), where yj = (yj1,.--,Yjp), Xj = 
(Xj1,---,Xjp), J = 1,..., nyx). A Bayes factor can be derived as in (1.15): 
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_ fy.x| A) 


BF = —_—.. 
fly, x | Ha) 


(3.21) 


The distribution of the measurements on the recovered and control materials is taken 
to be normal, with vector means 0, and 0x, and covariance matrices W, and Wy. 
Thus, 


(Y | Oy, W) ~N@y, W) ; (X | Ox, Wx) ~ NOx, Wx). (3.22) 


The Bayes factor is the ratio of two probability densities of the form f(y, x | Hj) = 
fity,x | m, W, B), i = 1, 2. The probability density in the numerator is given by 


fiO, x| u, W, B) = f FYl9, Wf 0, WFO | u, B)dd, (3.23) 


where 
1 ny , 
FO 18, W) = [2x 7P" Ww»? exp |-:ġe -0) W (y; o) s Oey) 


j=1 


f(x | 0, W) has the same probabilistic structure as f(y | 0, W), and 


f6 | u, B) = |27| PPB"? exp E (0 — W) B™' (0 — w| . 68.25) 


In the denominator, where y and x are taken to be independent, the probability 
density is given by 


f2. x | u, W, B) = hO | 0, W, B) x fX |0, W, B) (3.26) 


= f F010, WFO | u, BAO [foci 6, WFO | p, B)d0. 


This is equivalent to the algebraic expression of the Bayes factor in (1.23). In the 
numerator, under proposition H4, the source means 0, and 0, are assumed equal, 
say 0, = 6, = 0. In the denominator, under proposition H2, the source means 0, 
and @, are assumed to be different. 

The integrals in (3.23) and (3.26) have an analytical solution. A proof is given by 
Aitken and Lucy (2004). The numerator can be shown to be equal to 


-1 
fx | Hi) =| 22 W (HO? | IB [P] 2a [Cay nw + BT 


x exp {-3 [Fi 4 Fy +tr (syw-') $ (s.w-)]| , (6.27) 
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where: 

Fy =(w—py (+B) @ 

1= OF) aaa T (WB), 
-1 

Fh=G-9 (2+#) G-», 

= 1 ny m = 1 ny = 1 x 

v= mrn aA yet Ds xj), Í= a Uae PERIE 

ny a ed x za at 

Sy = Da wE S) S = Xi a) (& a). 

Consider the first factor in the denominator, f2 (y | 0, W, B). It can be obtained as 


fey | e, W, B) =| 2a W [| 2x B [7P] 2a (ny W7! + Bo) tN? 
Ir Siz = 
x exp |- [6 = wW LWB) i Ș — u) + t(s,w nJ} j 
(3.28) 
The second factor f2(x | 0, W, B) can be obtained analogously as 
PX | m, W, B) =| 2a W |P] QB 7P] 2a (n, W7! + BD |" 
lr = 
x exp |-> [« — p) (ny |W+B) 1 (e — n) + tr (s.w~)]| : 
(3.29) 
The Bayes factor in (3.21) then is the ratio between (3.27) and the product 


between (3.28) and (3.29), respectively. After some manipulation, the BF can be 
obtained as the ratio between 


1 i ap 1 
| 27 [omy +n) W7 +B] K exp{—5 (Fi +F] (3.30) 
and 
| 20 B |71] 2a (ny W7! + BHT! |1] 2x (n, W7! + BO)! |1 
1 
x exp l- (F; + ra] l (3.31) 
where: 


-1 
Fy=G-%)/ (2+ 4% 428) G-, 


w= {(# +B)” + (£ +a)‘ x (2+8) 5+ (£ +B) al. 
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The mean vector between sources yt, the within-source covariance matrix W, 
and the between-source covariance matrix B can be estimated using the available 
background data: 


~o 1 
== y Zij, (3.32) 
mn a 
1 m n 
Ŵ = j — Zi) (zij — i), 3.33 
ma — I) a ae Zi) (Zi; Zi) ( ) 
m aa 
A W 
= =\(5 =v 
B = =i > (Zi — Z) (Z; — Z) z” (3.34) 


where Z; = 1 D Zij. 


Example 3.12 (Glass Evidence) Consider a case in which two glass frag- 
ments are recovered on the jacket of an individual who is suspected to be 
involved in a crime. Two glass fragments are collected at the crime scene for 
comparative purposes. The competing propositions are: 


Hı : The recovered and known glass fragments originate from the same 
source (broken window at the crime scene). 

H: The recovered and known glass fragments originate from different 
sources. 


For each fragment, three variables are considered: the logarithmic trans- 
formation of the ratios Ca/K, Ca/Si, and Ca/Fe (Aitken and Lucy, 2004). 
Two replicate measurements are available for each fragment. Measurements 
on the two recovered fragments are 


3.77379 3.93937 
yi = | —0.89063 |, y2 = | —0.89343 
2.62038 2.63860 


Measurements on the two control fragments are 


3.84396 3.72493 
x; = | —0.91010 |, x2 = | —0.89811 
2.65437 2.61933 


Consider the database named glass-data.txt. This database is part 
of the supplementary material of Aitken and Lucy (2004) and contains n = 5 
replicate measurements of the elemental concentration of glass fragments 


(continued) 
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Example 3.12 (continued) 

from several windows (m = 62). The variables of interest (1.e., the logarithmic 
transformation of the ratios Ca/K, Ca/Si, and Ca/Fe) are displayed in 
columns 6, 7 and 8, while the object (window) identifier is in column 9. 


> population=read.table("glass-data.txt", header=T) 
> variables=c (6,7, 8) 
> grouping.item=9 


Measurements from the recovered fragments, y = (y1, y2), and measure- 
ments from the control fragments, x = (X1, X2), were selected from the 
available replicate measurements for the first group (window). The first two 
replicate measurements were selected to act as recovered data, while the last 
two replicate measurements were selected to act as control data 


item=1 
recovered=population [which (population[, grouping. 
item] ==item),][1:2,variables] 

recovered 


Yey y 


logCaK logCaSi logCaFe 
A SoVVSVS =O ts 905s A SAMS 
23 929237 0r8 93432763860 


> control=population [which (population[, grouping. 
+ item] ==item),] [4:5,variables] 
S CoOmjecol 


logCaK logCaSi logCaFe 
ay 3, 72493 =O 39s A~.oGilgss 
5 S,66573 0k88 96922763292 


Data concerning measurements from the first window were then excluded 
from the database 


> pop.back <- population[-which (population[, grouping. 
+ item] ==item), ] 


The database named pop.back will serve as background data and can be 
used to estimate the model parameters u, W and B as in (3.32), (3.33), 
and (3.34) by means of the function two.level.mv.WB contained 
in the routines file two_level_functions.r. This file is part of 
the supplementary materials available on the website of this book (on 


(continued) 
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Example 3.12 (continued) 
http://link.springer.com/) and can be run in the R console by 
inserting the command 


> BOUCO ("CWO level TunT oiT) 


The mean vector between sources, the within-source covariance matrix, and 
the between-source covariance matrix can therefore be obtained as follows: 


WB <- two.level.mv.WB(pop.back,variables, 
grouping.item) 

mu <- WBSall.means 

W <- WBSW 

B <- WBSB 

mu 


YO ONMO M SNA 


logCak logCaSi logCaFe 
[1,] 4.20495 -0.7425402 2.770238 


> W 


logCak logCaSi logCaFe 
logCaK 1.688046e-02 2.792714e-05 2.783344e-04 
logCaSi 2.792714e-05 6.545540e-05 8.362677e-06 
logCaFe 2.783344e-04 8.362677e-06 1.294188e-03 


> B 


logCak logCaSi logCaFe 
logCaK 0.71485025 0.099343866 -0.047824106 
logCaSi 0.09934387 0.062724678 -0.007360187 
logCaFe -0.04782411 -0.007360187 0.102438334 


The Bayes factor can be calculated as the ratio between (3.27) 
and (3.28) using the function two.level.mvn.BF available in the 
routines file two_level_ functions.r. This function is part of the 
supplementary materials available on the website of this book (on 
http://link.springer.com/). First, it is necessary to calculate the 
sample means y and x and to determine the sample size ny and ny 


ybar=as.vector (colMeans (recovered) ) 
xbar=as.vector(colMeans (control) ) 
ny=dim (recovered) [1] 
nx=dim(control) [1] 


vvvyv 
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Example 3.12 (continued) 
The Bayes factor can be obtained as 


> BF=two.level.mvn.BF(W, B, mu, xbar, ybar, nx, ny) 
> BF 


[il] 157.6265 


This Bayes factor represents moderately strong support for the proposition 
according to which the recovered and the control fragments originate from 
the same source, rather than from different sources. This is expected because 
the compared measurements refer to the same fragment. 


3.4.1.2 Non-normal Distribution for the Between-Source Variability 


The two-level random effect model presented in the previous section is based 
on the assumption of normality of the between-source variability. However, in 
many practical applications, observations or measurements do not exhibit (enough) 
regularity for standard parametric models to be used. For example, a multivariate 
normal distribution for the mean vector 6 may be difficult to justify. It can be 
replaced by a kernel density estimate, which is sensitive to multimodality and 
skewness, and which may provide a better representation of the available data. 

Starting from a database {z;; = (ziji... Ziji); i = 1,...,m and j = 
1,...,7)}, the estimate of the probability density distribution for the between- 
source variability can be obtained as follows: 


7 E 1 m E 
fO | Zi,- Zm, B, h) = — ` KO |Z, B, h), (3.35) 
m 


i=l 


where the kernel density function K(@ | z;, B, h) is taken to be a multivariate 
normal distribution centered at the group mean Z;, with covariance matrix h?B. 
The smoothing parameter h can be estimated as 


1 
. 4 N\A 
fx (= 7)’ m+. (3.36) 


See also Silverman (1986) and Scott (1992). 
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We first write a function hopt that computes the estimate of the smoothing 
parameter. 


> hopt=function (p,m) { 
+ h=(4/(2*p+1))*(1/ (p+4) ) x«m* (-1/ (p+4)) 
+ return (h) } 


Thus, if the number p of variables is set equal to 4 and the number of sources 
m is set equal to 30, the smoothing parameter h can be estimated as in (3.36) 
> p=4 

> m=30 

> hopt (p,m) 


[1] 0.5906593 


The BF can be obtained as in (3.21), where a multivariate normal distribution 
is used for the control and the recovered measurements as in (3.22), and a kernel 
distribution for the between-source variability, as in (3.35). The numerator and the 
denominator of the BF, fi(y, x | m, W, B) and fa(y, x | m, W, B), can be obtained 
analytically (Aitken and Lucy, 2004). The BF is the ratio between 


| B |! mh? |nywe! 


1 kii 1 
+n WT! + (h? BY! |” exp {—5 Fs} Yew [3h (3.37) 
and 
m 1 
—1 2 —1 ,;-1/2 
| nyWo! + (7B)! | eY {3h 
i—i 
m 1 
x | nx W7! + (h? B)! |? 2a {-3 | (3.38) 
p i— 
where: 


= —1 
F; = (w* -3 [aw +n,Ww) + (h2B)} (w* — %;), 
w= (ny wo! + nwo!) (nyWoly + nyW~!), 
= 
Fi = 0-47 (¥ +B) G-H), 


=j 
Fy = (&— 3)! (£ F 1B) (% —7;). 
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Example 3.13 (Glass Evidence—Continued) Consider the case examined in 
Example 3.12, and suppose a kernel distribution is used to model the between- 
source variability (Aitken and Lucy, 2004). Start from the same database, 
glass-data.txt, covering n replicate measurements of p variables for 
each of m = 62 different sources. The smoothing parameter can be estimated 
using the function hopt, for p = 3. 


p=3 
m=62 
h=hopt (p,m) 
A 


vvvyv 


[1] 0.5119462 


First, the group means Z; must be obtained. They are an output of the function 
two.level .mv .WB, previously used to estimate the model parameters. 


> group.means=WBSgroup.means 


Here we show only the first six rows of the (m x p) matrix, where each row 
represents the means of the measurements Z; = 1 Xi Zij. 


> head (group.means) 


logCaK 
A a (95500) 0E 
Sy Ap SVsilhO@O) (0). 
4 4.092612 -0O. 
5 4 AOS ALA —10) 
614.5948121 Or 
WY Antes) 10) < 


The Bayes factor 


logCaSi 
346682 
890684 
801742 
267606 
405718 
893428 


can then 


logCaFe 
-445828 
T92 22E 
. 761072 
- 665930 
-674566 
.898054 


N NNNNN 


be calculated as the ratio between (3.37) 


and (3.38) using the function two.level.mvk.BF contained in 
the routines file two_level_functions. This function is part of 
the supplementary materials available on the website of this book (on 
http://link.springer.com/). 


> source (iewo_level_ituinciei@ms). im") 
> BF=two.level.mvk.BF(xbar,ybar,nx,ny,W,B, group. 


means, h) 
> BF 


[| Sil, GOOs 


The Bayes factor represents moderately strong support for the proposition 
according to which the recovered and the control fragments originate from 
the same source, rather than from different sources. 
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A detailed comparison and discussion of the performance of these two multivari- 
ate random effect models can be found in Aitken and Lucy (2004). An alternative 
approach to the kernel density estimation is presented by Franco-Pedroso et al. 
(2016), modeling the between-source distribution by means of a Gaussian mixture 
model. 

Note that a third level of variability could be considered. In fact, one may wish 
to model separately the variability between replicate measurements from a given 
item originating from a given source (e.g., replicate measurements from a glass 
fragment originating from a given window) and the variability between different 
items originating from a given source (e.g., different glass fragments originating 
from the same window). This aspect will be tackled in Sect. 3.4.4 where three-level 
models will be introduced. 


3.4.1.3 Non-constant Within-Source Variability 


The two-level random effect models presented in Sects. 3.4.1.1 and 3.4.1.2 are 
characterized by the assumption of a constant within-source variability. In other 
words, it was assumed that every single source has the same intra-variability. 
While for some type of trace evidence this assumption is acceptable (e.g., for 
measurements of the elemental composition of glass fragments), a constant within- 
source variation may be more difficult to justify in other forensic domains. Consider, 
for example, the case of handwriting on questioned documents where it is largely 
recognized that intra-variability may vary between writers (Marquis et al., 2006). 

Suppose that a handwritten document of unknown source is available for 
comparative examinations. Handwritten items from a person who is suspected to 
be the writer are collected and analyzed. Multiple characters are analyzed on the 
questioned document and on the known writings of the person of interest. The 
following propositions are defined: 


Hı: The person of interest wrote the questioned document. 
Hy: An unknown person wrote the questioned document. 


The distribution of the vector of means within group (source) 6; is treated 
as explained in Sect.3.4.1.1, i.e., (0; | m, B) ~ N(w, B). An inverse Wishart 
distribution is chosen to model the uncertainty about the within-group covariance 
matrix, 


(Wi | 2,v) ~ W7!(2, v), (3.39) 


where £2 is the scale matrix and v are the degrees of freedom (Bozza et al., 2008). 
The scale matrix {2 is elicited in a way such that the prior mean of W; is taken 
to be equal to the within-group covariance matrix estimated from the available 
background data as in (3.33), while mw is estimated as in (3.32) and the between- 
group covariance matrix is estimated as 
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1 m 
B= aoe 2 —2)(z; — ZV. 
i= 


A two-level multivariate random effect model with an inverse Wishart distribu- 
tion, modeling the uncertainty about the within-source covariance matrix, has also 
been proposed by Ommen et al. (2017). 

First, consider the numerator of the Bayes factor in (3.21). If proposition Hı 
holds, then 6, = 0, = 0 and Wy = Wy = W, and the marginal likelihood is as 
follows: 


fiy.x| H) = fiy, x| u, B, 2,v) 


= J f(y |0,W)f(x|0,W) FO | u, B)fW | 2, v)d@, W), 
(3.40) 


where f (0 | ma, B) is as in (3.25), and 


c | R |P! /2 


SW | 2,0) = aa 


1 =1 
exp 5 it(W Q)T, 


where c is the normalizing constant (e.g., Press, 2005). 
If proposition H holds, then 6, Æ 0x and W, ¢ W,, and the marginal likelihood 
takes the following form: 


f.x] H2) = f(y, x | u, B, 2, v) (3.41) 


= f f(y |0,W)fO, W |u, B, 2, v)d(0, W) 
x I f(x|0, WFO, W | u, B, 2, v)d(0, W). 


The Bayes factor is the ratio between the marginal likelihoods in (3.40) and (3.41). 
However, these distributions are not available in closed form as the integrals do 
not have an analytical solution. Several approaches are available to deal with this 
problem. Chib (1995) estimates the marginal likelihood f(y,x | Hj) by a direct 
application of Bayes theorem, since the marginal likelihood can be seen as the 
normalizing constant of the posterior density f (0, W | y,x, Hi). The marginal 
likelihood can therefore be obtained as 


fY, Xx | 0, W)f@, W | Hi) 


fy, x | Hi) = FOW |y.x H) : (3.42) 


While the likelihood function f(y, x | 8, W) and the prior density f(0, W | Hi) 
can be easily evaluated at any parameter point (6*, W*), this is not the case for the 
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posterior density f (0, W | y, x, Hi), which is not known in closed form. A Gibbs 
sampling algorithm (Sect. 1.8) can be applied to the set of the complete conditional 
densities f(0 | W,y,x, Hj) and f(W | 0,y,x, Hj), and the posterior density 
f(0,W | y,x, Hi) can be approximated from the output of the Gibbs sampling 
algorithm as fO, W | y, x, Hi) (Chib, 1995; Bozza et al., 2008; Aitken et al., 2021). 

The marginal likelihood in (3.42) can be estimated at a given parameter point 
(0*, W*) as 


fy. x | 6", W*) f (0*, W* | Hi) 
f(0", W* | y, x, Hi) l 


fy,x| H) = 
The Bayes factor is then calculated as 


_ fx A) 


BF= ~*~ : 
fly, x | Ha) 


(3.43) 


As mentioned in Sect. 1.8, many other approaches are available, and their efficiency 
should be studied and compared. 


Example 3.14 (Handwriting Evidence) Consider a hypothetical case involv- 
ing a handwritten document. Handwritten items from a person of interest 
are available for comparative examinations. The propositions of interest are 
therefore: 


H; : The person of interest wrote the questioned document. 
Hy: An unknown person wrote the questioned document. 


Suppose that nı = 8 characters of type a are collected from the questioned 
document and that n2 = 8 characters of the same type are extracted from 
a document originating from the person of interest, taken for comparative 
purposes. The contour shape of loops of handwritten characters can be 
described using a methodology based on Fourier analysis (Marquis et al., 
2005, 2006). In brief, the contour shape of each handwritten character loop 
can be described by means of a set of variables representing the surface and a 
set of harmonics. Each harmonic corresponds to a specific contribution to the 
shape and is defined by an amplitude and a phase, the Fourier descriptors. 

Consider the database named handwriting.txt available on the 
book’s website. It contains data on p = 9 variables (i.e., the surface, the 
amplitude and the phase of the first four harmonics), measured on several 
characters of type a collected from m = 20 writers. The variables of interest 
are displayed in columns 2 to 10. Column 1 contains the item (writer) 
identifier 


(continued) 
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Example 3.14 (continued) 

> population=read.table('handwriting.txt', 

+ header=TRUE) 

> names (population) Se writer CT A0 U UAT, Wig U7 o 
UISA WAY UU UAA a yU) 

> variables=2:10 

> grouping.item=1 


In the current example, measurements y on the questioned document and 
measurements x on the control document were randomly selected from the 
available measurements on characters collected from a given writer (i.e., 
writer no. 1). Starting from a total number of, say, n available characters, 
2 x nı characters have been selected: the first nı characters serve as recovered 
data, while the remaining serve as control data 


> AESm=i 

> base=population [which (population][,grouping.item] 
& SSadicem) , Jl 

> nr=dim(base) [1] 

S il 

> recovered=as.matrix(base[1:n1,variables] ) 

> control=as.matrix (base[(n1+1): (2*n1),variables] ) 


Data concerning measurements from the selected writer were then excluded 
from the database 


> pop.back=population[-which (population[, grouping. 
+ item]==item), ] 


The database pop . back will serve as background data and can be used 
to estimate the model parameters as in (Bozza et al., 2008) using the function 
two. level .mv.WB available in the file two_level_functions.r. 


> source ('two_ level functions.r') 
> WB = two.level.mv.WB(pop.back,variables, 

+ grouping.item,nc=TRUE) 

> mu = t (WBSall.means) 

> W = WBSW 

> B = WBSB 

The number of degrees of freedom v of the inverse Wishart distribution is 


chosen so as to reduce the variability of this distribution, centered at the 
within-source covariance matrix estimated as in (3.33). 


> p=9 
> nu=40 
> Omega=Ws (nu-2*p-2) 


(continued) 
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Example 3.14 (continued) 
The Gibbs sampling algorithm is run over 10000 iterations with a burn-in of 
1000. 


> n.iter=10000 
> burn.in=1000 


The Bayes factor in (3.43) can then be calculated using the function 
two.level.mvniw.BF that is part of the supplementary materials. Note 
also that this routine requires other routines that are available in the packages 
MCMCpack (Martin et al., 2021) and mvtnorm (Genz et al., 2020). 


> BF=two.level.mvniw. BF (recovered, control, Omega, B,mu, 
# ial, jo, ial, leca loybiieial aa) 
> BF 


[Lj 5543330 


The Bayes factor represents extremely strong support for the proposition 
according to which the questioned and the recovered handwritten materials 
originate from the same source, rather than from different sources. A fully 
documented open-source package (Gaborini, 2019) has been developed by 
Gaborini (2021). 


Note that it is important to critically examine large BF values, such as the one 
obtained above. For a discussion about extreme values, see Aitken et al. (2021), 
Hopwood et al. (2012), and Kaye (2009). Moreover, as underlined in Sect. 1.11, 
the marginal likelihood is highly sensitive to the prior assessments and so is the 
BF. In particular, while the overall mean vector, the within- and the between-source 
covariance matrices are estimated from the available background data, the number 
of degrees of freedom of the inverse Wishart distribution are chosen so as to reduce 
the dispersion of the prior. A sensitivity analysis may be performed to assess the 
sensitivity of the BF to different choices of the degrees of freedom v in (3.39). 

The BF may also be sensitive to the MCMC approximation. Figure 3.5 provides 
an illustration of BF variability. Results are based on 50 realizations of the BF 
approximation in (3.43). 


> ns=50 

> BFs=matrix (0,nrow=ns,ncol=1) 

> for(i in 1:ns){ 

+ BFs[iJ=two.level.mvniw.BF (recovered, control, Omega,B, 
+ mu,nu,p,n.iter,burn.in) } 

> hist (log(BF),freq=F,main='',xlab='log(BF) ') 
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Fig. 3.5 Histogram of 50 
realizations of the BF 
approximation in (3.43) 
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The models discussed here rely on the assumption of independence between 
sources, focusing on the inherent variability of features. In the case of questioned 
documents (Sect. 3.4.1.3), this amounts to assume that handwritten material has 
been produced without any intention of reproducing someone else’s writing style. 
The possibility of forgery and/or disguise breaks the independence assumption made 
at denominator. Section 3.4.3 will address this complication. 


3.4.2 Assessment of Method Performance 


The results of the procedures described in the previous sections may be sensitive to 
changes in the features of recovered and control materials, the available background 
information, as well as to choices made during probabilistic modeling and prior 
elicitation. A sensitivity analysis may be conducted in order to gain a better 
understanding of the properties of the chosen method. It is fundamental to gain 
an understanding of how well a method performs: if the recovered and control 
data originate from the same source, the BF is expected to be greater than 1. Vice 
versa, if the compared items come from different sources, a BF smaller than 1 is 
expected. 

Several methods exist for the assessment of the performance of the methods for 
evidence evaluation. Commonly encountered measures in this context are rates of 
false negatives (i.e., cases in which the Bayes factor is smaller than 1, supporting 
hypothesis H2, when hypothesis H; holds) and false positives (i.e., cases in which 
the Bayes factor is greater than 1, supporting hypothesis Hı, when hypothesis 
H holds). The rate of false negatives is the number of same-source comparisons 
with a Bayes factor smaller than 1 divided by the total number of same-source 
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comparisons. The false positive rate is the number of different-source comparisons 
with a Bayes factor greater than 1 divided by the total number of different- 
source comparisons. Given a database of cases (e.g., measurements on handwriting 
characters) for which the source is known, it is possible to study the behavior of the 
Bayes factor as the data pertaining to control and recovered items change. 

Consider again the questioned document case discussed in Sect. 3.4.1.3. There is 
variability in handwriting, and the reported Bayes factor is sensitive to variability 
of the shape of handwritten characters. This is not surprising as no one writes 
the same word exactly the same way twice. Consider measurements of features of 
handwritten characters of a given writer taken from the available database. These 
measurements are organized into a (n x p) matrix, where n is the number of 
available handwritten characters and p represents the number of features (variables). 
Denote this matrix base. Suppose that, among the n characters, we select a certain 
number 2 x nı < n of characters, forming a group. Repeating this a certain number 
of times leads to multiple groups. On each member (character) within a group, p 
variables are measured. Then we take pairs of groups (i.e., measurements on the 
group members), taken to represent recovered and control data. Then, the Bayes 
factor is calculated for each couple. Here, each couple represents a same-source 
comparison. 


Example 3.15 (Two-Level Model for Handwriting—Assessment of Model Per- 
formance) Recall Example 3.14 where a total number of 16 characters have 
been randomly selected from the available characters collected from a given 
writer (writer no. 1), extracted from the database handwriting.txt. A 
Bayes factor equal to 5543330 was obtained. If different sets of characters 
are extracted, the Bayes factor will be influenced (also) by the within-writer 
variability. 

Suppose now that, for the same writer, ns = 50 distinct groups of 
characters (each of size 16) are drawn and split into groups of size 8 to act 
as questioned and control data. The Bayes factor is calculated for each of the 
50 groups. Clearly, since the sampled measurements originate from the same 
writer, we expect Bayes factors greater than 1. 


= iMs=5(0) 

> n=dim(base) [1] 

= mil=e 

> BFs=matrix(0,nrow=ns,ncol=1) 

> for (a aim Isma) if 

+ ind=sample (1:n,2*n1,replace=F) 
recovered=as.matrix(base[find[1:ni], 
variables] ) control=as.matrix (base 
[ind[ (n1+1) :length(ind)],variables]) 


+ + + 


(continued) 
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Example 3.15 (continued) 

+ BFs [i] =two.level.mvniw. BF (recovered, 
control, Omega, 

8), iii, AU PO ml. EET DUTA A) 


} 


Figure 3.6 shows a histogram of the results for the ns = 50 groups of 
sampled characters. No false negatives have been observed. The range of the 
BF values obtained is given here below 


+ + + 


> range (BFs) 
[1] 1.709027e+02 1.438262e+29 


There is also variability between writers, as no two writers write exactly 
alike. Consider now measurements of features of handwritten characters from 
a different writer, say writer no. 6, drawn from the same database. These 
measurements are stored in a matrix denoted base2. 


> item2=6 

> base2=population [which (population][, grouping. item] == 
+ item2),] 

> n2=dim(base2) [1] 


We first estimate the population parameters from the background population 
where both selected writers have been eliminated. 


> pop.back=population[-which (population [, grouping 

+ .item]==item/population[,grouping.item]==item2) ,] 
> WB = two.level.mv.WB(pop.back,variables, 

+ grouping.item,nc=TRUE) 

> mu = t (WBSall.means) 

> W = WBSW 

> B = WBSB 

> 


Omega=W« (nu-2*p-2) 


Next, for each of the two writers, take 50 groups of characters (from base 
and base2). Each group contains 8 members, on each of which p features are 
measured. Then, take a group from each writer and form a so-called known 
different-source pair, and do this multiple times. These draws are taken to 
represent recovered and control data. Then, the Bayes factor is calculated for 
each couple. 


ns=50 

n=dim (base) [1] 
nc=dim(base2) [1] 
mil=& 


vvvyv 


(continued) 
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Fig. 3.6 Histogram of 
log(BF) values for 50 groups, 
each containing 8 handwritten 
characters, sampled from a 
given writer to act as 
questioned and control 
datasets 
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Example 3.15 (continued) 


> BFs2=matrix(0,nrow=ns,ncol=1) 

S ioe (al aim i pias) ff 

+ val.r=sample (1:n,n1) 
recovered=as.matrix(base[val.r, variables] ) 
val.c=sample (1:nc,n1) 
control=as.matrix (base2 [val.c, variables] ) 
BFs [i]J=two.level.mvniw.BF (recovered, 
control, Omega, B, 

mu,nu,p,n.iter, burn.in) 


} 


Figure 3.7 shows a histogram of the results. No false positives have been 
observed. The range of the BF values obtained is 


+++ +++ 


> range (BFs) 
[1] 2.733273e-10 7.034354e-02 


The variability of BF values for different samples is not surprising because of 
handwriting variability. However, this should not be understood as there being a 
Bayes factor distribution. See, e.g., Morrison (2016), Ommen et al. (2016), and 
Taroni et al. (2016) for a discussion of issues relating to the reporting of the precision 
of forensic likelihood ratios. 

Over the past decade, several other approaches have been proposed in forensic 
statistics literature for evaluating the performance of statistical procedures, based 
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Fig. 3.7 Histogram of 
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on a likelihood ratio or a Bayes factor. These methods provide a rigorous approach 
to assessing and comparing the performance of evaluative methods prior to using 
them in casework and forensic reporting. See, in particular, Ramos and Gonzalez- 
Rodriguez (2013) and Ramos et al. (2021) for a methodology to measure calibration 
of a set of likelihood ratio values and the concept of Empirical Cross-Entropy for 
representing performance, illustrated using examples from forensic speech analysis. 
These concepts are also discussed by Meuwly et al. (2017) who present a guideline 
for the validation of evaluative methods considering source level propositions. 
Zadora et al. (2014) present performance assessment for physicochemical data in 
the context of trace evidence (e.g., glass). For a recent review, see also Chapter 8 of 
Aitken et al. (2021). 


3.4.3 On the Assumption of Independence Under H, 


The models presented in Sect. 3.4.1 are based on the assumption of independence 
between the questioned and known materials under hypothesis H2. This may be 
reasonable for certain types of evidence and cases, but less for others. In fact, while 
a physical feature (e.g., the elementary composition of glass fragments) requires 
external constraint to be altered, a behavioral or biometric feature such as signature 
can be modified intentionally. 

Consider handwriting as an example. When evaluating results of comparative 
handwriting examination, the case circumstances may be such that there is no issue 
of handwriting features being disguised or the result of an attempt to imitate the 
handwriting of another person. The approach suggested in Sect. 3.4.1.3 may thus 
be applicable. In turn, in case of alleged forgery of signatures, the (unknown) 
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writer specifically intends to reproduce features of a target signature. The allegation, 
then, is that a signature is either simulated or disguised, rather than presenting a 
correspondence or similarity with a genuine signature by mere chance alone (Linden 
et al., 2021). In such cases, the Bayes factors previously developed in Sect. 3.4.1 
cannot be used to approach the question of interest here because the assumption of 
independence between sources at the denominator cannot be maintained. It follows 
that one must compute 


_ fly |x, M) 


BF = ——WT—__, 
f(y | x, H) 


(3.44) 


as f(y | x, H2), following the above argument, does not simplify to f(y | H2) (see 
also Sect. 1.5.1). 
Consider the following competing propositions: 


Hı : The person of interest (POI) produced the questioned signature. 
H : An unknown person produced the questioned signature, trying to simulate the 
POPs signature. 


If proposition Hz is true, the forensic document examiner has to deal with a 
signature written by someone who has knowledge of the POI’s signature. 

Consider the two-level model in Sect.3.4.1.3 where the distribution of the 
measurements on the recovered and control data is taken to be Normal, with vector 
means 6, and 6,, and covariance matrices Wy and Wy 


(Y | 0y,Wy) ~N@y, Wy) ; (X | Ox, Wx) ~ NOx, Wx). (3.45) 


The probability densities at the numerator and denominator of the BF in (3.44) can 
be obtained as 


Sy, x | Hi) = fily, x | Hi, Bi, 2i, vi) 


= I SY | 0, W)f@, W | X, hi, Bi, 2), vi), (3.46) 


where (mi, Bi) and (§2;, vi) are the hyperparameters of the prior distributions 
under the competing propositions (i.e., a normal prior and an inverse Wishart prior 
distribution). The Bayes factor can thus be calculated as 


_ fQx | m, Bi, 21, v1) 
faly, X | M2, Bo, 22, v2)” 


(3.47) 


Two different background databases are needed to inform model parameters 
under the competing propositions: a database of genuine signatures (z;;) and a 
database of imitated signatures (s;;). Someone who imitates a signature needs to 
work outside their writing habits and movement patterns. Thus, simulated signatures 
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do not reflect the same movements and writing features as genuine signatures. 
Model parameter mw; can be estimated as in (3.32), and B; as explained in Sect. 
3.4.1.3. The scale matrix (2; can be chosen so as to center the prior distribution at 
the within-group covariance matrix W; that can be estimated as in (3.33). 

The probability densities in (3.46) are not available in closed form but can 
be estimated from the output of a MCMC algorithm following, for example, the 
ideas described in Sect. 3.4.1.3. A Gibbs sampling algorithm is implemented here. 
The routine is different from that developed in Sect. 3.4.1.3 because it calculates 
the BF in (3.47). In this formula, no assumption of independence is made at the 
denominator, and two different databases are used. 


Example 3.16 (Digitally Captured Signatures) Consider a case involving a 
questioned signature on a contract signed on a digital tablet. The person 
of interest denies having signed the contract. Among the multiple features 
that are captured by the digital tablet, the average speed and writing time 
are considered here. See Linden et al. (2021) for a detailed description 
of the experimental conditions. Measurements on the questioned signature 
are y = (4639, 380.42), while measurements on the control signature are 
x = (4460, 323.4787). Note that the first value is the average speed and the 
second is the writing time. 


> quest=c (4639, 380.42) 
> ref=c (4460,323.4787) 


Model parameters under hypothesis Hj (i.e., the mean vector m4, the within- 
group covariance matrix Wj, and the between-group covariance matrix B1) 
are estimated from an available database of genuine signatures (z;;) and are 
given here below. 


mug=matrix (c(2754.767,511.284) ,ncol=1) 
Wg=matrix(c(95755.861, -4214.939, -4214.939, 
2857.975) , byrow=T, nrow=2) 

Bg=matrix (c (3377136,30548.24,30548.24,20335.10), 
byrow=T, nrow=2) 


+ V+ VV 


The trace matrix of the inverse Wishart distribution is then obtained as 


> IA 
> nu=10 
> Omegag=Wg* (nu-2*p-2) 


In the same way, model parameters under hypothesis H2 are estimated from 
an available database of simulated signatures (s;;) and are given here below. 


(continued) 
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Example 3.16 (continued) 

musamaeriss (C (M4AGZ4 3), 145.0719) , meoilsil) 
Ws=matrix (c (14798844, -42412.0995, -42412.0995, 
940.0561), byrow=T,nrow=2) 

Basme ais (E (IVS 7/5238 58, 157142237, SUAR AST, 
3691.482), byrow=T,nrow=2) 

Omegas=Ws x (nu-2*p-2) 


MOE WSs WW 


A Gibbs sampling algorithm is run over 10000 iterations, with a burn-in of 
1000. 


> n.iter=10000 
purna n= 1OOO 


The Bayes factor in (3.44) can then be calculated using the function 
two. level .mvniw2 .BF (see supplementary materials). 


source ('two_level_functions.r') 
BF=two.level.mvniw2.BF(quest,ref,Wg,Bg,mug,Ws,Bs, 
mus,nu,p,n.iter,burn.in) 

BF 


We S> WW 


[1] 40846.87 


The BF represents very strong support for the proposition according to 
which the questioned signature originates from the person of interest rather 
than from an unknown person who attempted to imitate the target signature. 


3.4.4 Three-Level Models 


So far, two-level models have been considered, taking into account the within-source 
and the between-source variability. However, it is not uncommon to encounter 
situations in which the hierarchical ordering shows an additional level of variability, 
e.g., in relation to measurement error. 

Denote again by p the number of variables observed on items of a given 
evidential type. Suppose that continuous measurements of these variables are 
available on a random sample from m sources with s items for each source and 
n replicate measurements on each of the N = ms items. The background data can 
be denoted by Zikj = (Zikj1,---; Zikjp)’ > where i = 1, ..., m denotes the number of 
sources (e.g., windows, writers), k = 1, ..., s denotes the number of items for each 
source (e.g., glass fragments, handwritten characters), and j = 1,..., denotes the 
number of replicate measurements for each item. 
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A Bayesian statistical model for the evaluation of evidence for three-level 
normally distributed multivariate data was proposed by Aitken et al. (2006), 
focusing on the elemental composition of glass fragments. Denote the mean vector 
within item k in group i as 6; and the covariance matrix of replicate measurements 
as W. For the variability of replicate measurements, the distribution of Zig; is taken 
to be normal, Zik; ~ Nix, W). 

Denote by m; the mean vector within group i and by V the within-group 
covariance matrix. The distribution of 0;, for the within-group variability is taken 
to be normal, 0;, ~ N(m;, B). 

Denote by @ the mean vector between groups. Let U denote the between-group 
covariance matrix. For the between-group variability, the distribution of the p; is 
taken to be normal, m; ~ N(@, V). 

Consider the case described in Sect.3.4.1, where measurements are available 
on ny items from an unknown origin as well as measurements on ny items from 
a known origin. These two groups of items may or may not come from the same 
source. Competing propositions may be formulated as follows: 


Hı : The recovered and the control items originate from the same source. 
Hy: The recovered and the control items originate from different sources. 


There are nı replicate measurements available on each of the recovered ny items. 
Denote the measurement vector by y, where the vector components are denoted 
by yxj (for k = 1,...,my and j = 1,...,m1) and yk; = (Ykjl; ---, Yejp)’. For 
each of the nyx control items, n2 replicate measurements are available. Denote the 
measurement vector by x, where the vector components are denoted (Xķj, k = 
1,...,nm, and j = 1,...,m2) and Xij = (xgj1,-.. Xkjp)’- 

The Bayes factor is the ratio of two probability densities of the form f(y, x | 
Hi) = fily,x | @, W, B, V), i = 1,2. The probability density in the numerator is 
given by 


Sify, x |$, W, B, V) 


= [ [to |O,W)fx|O,W)fO | u, B)fCu ld, Vidude, (3.48) 


where all probability densities are multivariate normal. 
In the denominator, the probability density is given by 


faly.x| 6, W, avy= | [ralawre | p, B)f (u | $, Vjdndð 


x [ fro 10, W)f@ | u, B)f (u | $, Vidude, 
(3.49) 


where all probability densities are multivariate normal. 
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As shown by Aitken et al. (2006), the value of the evidence is the ratio of 


|B +V || [nyn +nyn2)W! 


1 
+(B+V)7"] |7! exp {-3 + r»| (3.50) 
to 
| (aynı W7! + (B + V) [7] nena W! + (B+ Vy! Yl? 
1 
x ap|-3 0+ Fo| ; (3.51) 
where: 
ae ninya WIN o- 
F = 9-9)! (er) g-a, 
Fy = (W— $) ((nyny + nxn)! W + B + V) | Ww), 
= = -1 ve 
F = (¥ — 9)’ [nyn 'W+B+V] E-o), 
Fa = & — o) [xn W +B + V] &- $), 
aid w= nynjý+nxn2X 


nynj+nyn2 ` 
The overall mean ġ, the measurement error covariance matrix W, the within- 


group covariance matrix B, and the between-group covariance matrix V can be 
estimated using the available background data: 


m S n 


x 111 
$= ——— DD 2 i (3.52) 


i=l k=1 j=1 


Pi 1 m S n 7 . 

Ve ay De 2 X Giy — Zik.) (Zikj — Bix)’, (3.53) 
msn = N T a j=l 

x 1 m S Ww 

fay Zik. — Zi.) Bik. — Zi.) — —, 3.54 
m(s — 1) 2 2 ën. Zi.) (Zik, — Zi.) P ( ) 

2 1 m E 7 7 7 B W 

V= mai DG. = Z...)(Zi.. — Z...) w (3.55) 

f ia 
where Zik. = 5 D1 Zikjs Ži. = $ Dg= Bik. and Zi, = $ Dji Ži.. 


Example 3.17 (Glass Evidence—Continued) Consider again the case 
described in Example 3.12 where two glass fragments are recovered on the 
jacket of an individual who is suspected to be involved in a crime. Two glass 
fragments are collected at the crime scene for comparative purposes. The 
competing propositions are: 


(continued) 
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Example 3.17 (continued) 

Hı : The recovered and known glass fragments originate from the same 
source (e.g., a broken window). 

H: The recovered and known glass fragments originate from different 
sources. 


A database named glass-database.txt is available as part of the 
supplementary material of Zadora et al. (2014). It contains measurements 
of the elemental concentration of glass fragments from several windows 
(m = 200). For each source, there are s = 12 fragments with n = 3 
replicate measurements. For each fragment, five variables are considered: the 
logarithmic transformation of the ratios Na/O, Mg/O, Al/O, Si/O, Ca/O. 
The variables of interest are displayed in columns 3, 4,5, 6, and 8, while 
the object (window) identifier is in column 1. The fragment identifier is in 
column 2. 


population=read.table('glass-database.txt', 
header=T) 

variables=c(3,4,5,6,8) 

grouping. item=1 

grouping. fragment=2 


WW We Se WY 


Three replicate measurements are available for each fragment. Using the 
notation introduced above 


ny=2 
ox =Z 
mil=3 
M2=3} 


vvvyv 


Measurements for the recovered fragments, y, and measurements for the 
control fragments, x, were selected from the available data for the first 
and second group (window) and the first two items (fragments) from these 
windows. Therefore, a BF smaller than 1 is expected. 


> recovered.item=1 
> control.item=2 

> base_c=population [which (population[, grouping. item] 
+ ==control.item),] 

> base_r=population [which (population[, grouping. item] 
+ ==recovered.item),] 

> recovered=base_r[which(base_r[,grouping. fragment ] 
+ ==1/base_r[,grouping.fragment] ==2), 

+ c(2,variables) ] 

> recovered 
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Example 3.17 (continued) 
Data concerning measurements from the first two windows were then 
excluded from the database 


> pop.back <- population[-which (population [, 
+ grouping. item] ==1/population[, grouping. item] ==2),] 


The database named pop . back will serve as background data. It can be used 
to estimate the model parameters @, W, B, and V as in (3.52), (3.53), (3.54) 
and (3.55) by means of the function three. level .mv.WBV contained in 
the routines file three _level_functions.r. This file is part of the 
supplementary materials available on the book’s website and can be run in 
the R console with the command 


> soumce (| thro level LUCENE. 2" ) 


The overall mean, the measurement error covariance matrix, the within- 
source covariance matrix, and the between-source covariance matrix can be 
estimated as follows: 


> WBV=three.level.mv.WBV(pop.back,variables, 
+ grouping.item, grouping. fragment) 

> psi=WBVSoverall.means 

> W=WBVSW 

> B=WBVSB 

> V=WBVSV 


The Bayes factor can be calculated as the ratio between (3.50) and (3.51) 
using the function three. level .mvn.BF available in the routines file 
three level functions. r. This function is part of the supplementary 
materials available on the book’s website. 


> BF=three.level.mvn.BF(bary, barx,barw,ny,nx,n1,n2, 
+ psi,W,B,V) 
> BF 


MI 0000053296 


The Bayes factor represents extremely strong support for the proposition 
according to which the recovered and the control fragments originate from 
different sources, rather than from the same source. 


Note that the above development does not take into account the topic of variable 
selection. See Aitken et al. (2006) for a proposal for dimensionality reduction based 
on a probabilistic structure, determined by a graphical model obtained from a scaled 
inverse covariance matrix. 
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3.5 Summary of R Functions 


The R functions outlined below have been used in this chapter. 


Functions Available in the Base Package 
colMeans: Forms column means for numeric arrays (or data frames) 


d <name of distribution>, p <name of distribution> (e.g., 
dpois, pnorm): Calculate the density and the cumulative probability for 
many parametric distributions. 


More details can be found in the Help menu, help.start(). 


Functions Available in Other Packages 
dinvgamma in package ext raDistr: calculates the density of an inverse gamma 
distribution. 


dstp in package LaplacesDemon: calculates the density of a non-central 
Student t distribution. 


Functions Developed in the Chapter 

hopt: Calculates the estimates h of the smoothing parameter h. 
Usage: hopt (p,m). 

Arguments: p, the number of variables: m, the number of sources. 
Output: A scalar value. 


poisg: Computes the density of a Poisson—gamma distribution Pg(«, 6, 1) at x. 
Usage: poisg(a,b,x). 

Arguments: a, the shape parameter a; b, the rate parameter £; x, a scalar value x. 
Output: A scalar value. 


post distr: Computes the posterior distribution N(x, TŻ) of a normal mean 9, 
with X ~ N(@, 07) and 0 ~ N(, T°). 

Usage: post_distr (sigma,n,barx,pm, pv). 

Arguments: sigma, the variance o? of the observations; n, the number of observa- 
tions; barx, the sample mean x of the observations; pm, the mean y of the prior 
distribution N(u, t2); pv, the variance t? of the prior distribution N(u, T?). 

Output: A vector of values, the first is the posterior mean uy, the second is the 


posterior variance TŻ. 


two.level.mv.WB: Computes the estimate of the overall mean p, the group 
means Z;, the within-group covariance matrix W, and the between-group covari- 
ance matrix B for the two-level model in Sect. 3.4.1. 

Usage: two.level.mv.WB(population, variables, grouping. 
variable,nc=FALSE). 

Arguments: population, a data frame with N rows and k columns for measure- 
ments on m sources with n items for each source; variables, a vector con- 
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taining the column indices of the variables to be used; grouping. variable, 
a scalar specifying the variable that is to be used as the grouping factor. By 
default(nc = FALSE), the between-group covariance matrix is estimated as in 
Sect. 3.4.1.1. Ifnc = TRUE, the between-group covariance matrix is estimated 
as in Sect. 3.4.1.3. 

Output: The group means Z;, the estimated overall mean jt, the estimated within- 
group covariance matrix W, the estimated between-group covariance matrix B. 


two.level.mvn.BF: Computes the BF for a two-level random effect model 
where both the within-source variability and the between-source variability 
are normally distributed, and the within-source covariance matrix is constant 
between sources. 

Usage: two. level.mvn.BF(W,B,mu,xbar,ybar,nx,ny). 

Arguments: W, the within-source covariance matrix; B, the between-source covari- 
ance matrix; mu, the mean vector between sources; xbar, the vector of means 
for the control item; ybar, the vector of means for the recovered item; nx, 
the number of measurements for the control material; ny, the number of 
measurements for the recovered material. 

Output: A scalar value. 


two.level.mvk.BF: Computes the BF for a two-level random effect model 
where the within-source variability is normally distributed, the normal distribu- 
tion for the between-source variability is replaced by a kernel density distribu- 
tion, and the within-source covariance matrix is constant between sources. 

Usage: two. level .mvk.BF (xbar, ybar,nx,ny,W,B,group.means,h). 

Arguments: xbar, the vector of means for the control item; ybar, the vector 
of means for the recovered item; nx, the number of measurements for the 
control material; ny, the number of measurements for the recovered material; W, 
the within-source covariance matrix; B, the between-source covariance matrix; 
group.means, a (m x p) matrix, where each row represents the vector of 
means Z; = 1 ae 1 Zij; h, the smoothing parameter. 

Output: A scalar value. 


two.level.mvniw.BF: Computes the BF for a two-level random effect model 
where both the within-source variability and the between-source variability are 
normally distributed, and the uncertainty about the within-source covariance 
matrix is modeled by an inverse Wishart distribution. 

Usage: two.level.mvniw.BF(quest,ref,O,B,mu,nw,p,n.iter, 
burn.in). 

Arguments: quest, a (n x p) matrix containing measurements on the questioned 
material; ref, a (n x p) matrix containing measurements on the control material; 
O, the trace matrix of the inverse Wishart distribution; B, the between-source 
covariance matrix; mu, the mean vector between sources; nw, the number 
of degrees of freedom of the inverse Wishart distribution; p, the number of 
variables; n. iter, the number of iterations of the Gibbs sampling algorithm; 
burn. in, the number of discarded iterations. 
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Output: A scalar value. 


two.level .mvniw2 .BF: Computes the BF for a two-level random effect model 
where both the within-source variability and the between-source variability are 
normally distributed, the uncertainty about the within-source covariance matrix is 
modeled by an inverse Wishart distribution with no assumption of independence 
between questioned and known materials at the denominator (i.e., under H2). 

Usage: two. level .mvniw2.BF(quest,ref,0g,Bg,mug,0s,Bs,mus, 
nu,p,n.iter,burn.in). 

Arguments: quest, a (n x p) matrix containing measurements on the questioned 
material; ref, a (n x p) matrix containing measurements on the control material; 
Og, the trace matrix of the inverse Wishart distribution from the database 
of genuine (handwritten) material; Bg, the between-source covariance matrix 
from the database of genuine (handwritten) material; mug, the mean vector 
between sources from the database of genuine (handwritten) material; Os, the 
trace matrix of the inverse Wishart distribution from the database of simulated 
(handwritten) material; Bs, the between-source covariance matrix from the 
database of simulated (handwritten) material; mus, the mean vector between 
sources from the database of simulated (handwritten) material; nw, the number 
of degrees of freedom of the inverse Wishart distribution; p, the number of 
variables; n. iter, the number of iterations of the Gibbs sampling algorithm; 
burn. in, the number of discarded iterations. 

Output: A scalar value. 


three.level.mv.WBV: Computes the estimate of the overall mean @, the 
measurement error covariance matrix W, the within-group covariance matrix B, 
and the between-group covariance matrix V for the three-level model presented 
in Sect. 3.4.4. 

Usage: three.level.mv.WBV (population, variables, grouping. 
item, grouping.fragment). 

Arguments: population, a data frame with msn rows and k columns collecting 
measurements on m sources with s items for each source and n replicate 
measurements for each item; variables, a vector containing the column 
indices of the variables to be used; grouping. item, a scalar specifying the 
variable that is to be used as the grouping item; grouping.fragment, a 
scalar specifying the variable that is to be used for the grouping fragment. 

Output: The estimated overall mean d. the estimated measurement error covariance 
matrix W, the estimated within-group covariance matrix B, the estimated 
between-group covariance matrix v. 


three.level.mvn.BF: Computes the BF for a three-level random effect model 
where the variation at all three levels is normally distributed. 

Usage: three. level .mvn.BF(bary,barx,barw,ny,nx,nl,n2,psi, 
W,B,V). 

Arguments: bary, the mean vector of measurements on recovered items; barx, 
the mean vector of measurements on control items; barw, the mean vector of 
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measurements; ny, the number of recovered items; nx, the number of control 

items; n1, the number of replicate measurements on each of the recovered items; 

n2, the number of replicate measurements on each of the control items; psi, 

the overall mean vector; W, the replicate measurements covariance matrix; B, the 

within-group covariance matrix; V, the between-source covariance matrix. 
Output: A scalar value. 


Published with the support of the Swiss National Science Foundation (Grant no. 
10BP12_208532/1). 
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Chapter 4 A) 
Bayes Factor for Investigative Purposes gag 


4.1 Introduction 


Forensic laboratories routinely face the problem of classifying items or individuals 
into one of several classes or populations on the basis of available data (e.g., 
measurements of one or more attributes), when no control material is available 
for comparison. As discussed in Sect. 1.6, forensic analyses can provide valuable 
information regarding the category membership of a particular item. For example, 
it may be of interest to classify banknotes seized on a person of interest as either 
banknotes from general circulation or banknotes related to drug trafficking (Wilson 
et al., 2014). The collected material is analyzed (e.g., the degree of contamination 
with cocaine is measured), and results are evaluated in terms of their effect on the 
odds in favor of a proposition Hı according to which the recovered items originate 
from a given population (e.g., banknotes in general circulation), compared to an 
alternative proposition H2 according to which the recovered items originate from 
another population (e.g., banknotes related to drug trafficking). 

An assumption made throughout this chapter is that there is a finite number 
of populations to which an item of interest may belong. Each population will be 
characterized by a member from a family of probability distributions. Data can be 
either discrete or continuous, though for the latter it is easier to find examples and 
applications. There are many instances where the scientific evidence is described by 
several variables, and available measurements take the form of multivariate data. As 
mentioned in Sect. 3.1, data do not always present enough regularity so that standard 
parametric distributions could be used (e.g., the normal model). Moreover, data may 
present a complex dependence structure with several levels of variation. 
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This chapter is structured as follows. Sections 4.2 and 4.3 address the problem 
of classification for various types of discrete and continuous data, respectively. 
Section 4.4 presents an extension to continuous multivariate data. Note that most of 
the examples developed in this chapter involve only two populations. An extension 
to more than two propositions is given in Sect. 4.2.2. 


4.2 Discrete Data 


This section deals with measurement results in the form of counts, using the 
binomial model (Sect. 4.2.1) and the multinomial model (Sect. 4.2.2). 


4.2.1 Binomial Model 


Imagine a case in which the issue is the quality of a consignment of Basmati rice. 
Basmati is a rice variety originating from the Indian subcontinent that became 
valuable in international trade in the last decades. This prompted the cultivation 
of high-yielding Basmati derivatives. Traditional and evolved (non-traditional) 
varieties, however, have distinct characteristics (e.g., Kamath et al., 2008), and 
distinguishing between varieties may be a relevant analytical task. Given a batch of 
Basmati rice of unknown type, the following pair of propositions may be of interest: 


Hı: The batch is traditional Basmati rice. 
H: The batch is non-traditional Basmati rice. 


Denote by 6; and 6z the proportion of chalky grains in the two populations, 
respectively. Available counts can be treated as realizations of Bernoulli trials 
(Sect. 2.2.1) with constant probability of success 6; (02). Suppose a conjugate beta 
prior distribution Be(œ;, 6;) is used to model uncertainty about 0;, where œ; and f; 
can be elicited using the available background knowledge (as in Sect. 1.10). 

Among several characteristics of interest, such as grain length, thickness, weight, 
etc., is the percentage of chalky grains, determined by counting the number of grains 
having chalky area. A sample of size n is inspected, and a total number y of chalky 
grains are observed. This can be treated as a realization of a binomial distribution 
Bin(n, 8). 

The marginal distribution at the numerator and denominator can be computed as 
in (1.25): 


ue (") Pa + Bi) (i +y Bi +n- y) 
moe TODD a; +n) 
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This is a beta-binomial distribution with parameters n, a;, and £;. The Bayes factor 
in favor of proposition Hı can be computed as in (1.26) and becomes 

fa) _ T@+ LDE + yl (Bi +n- DT DT (a2 +n + b2) 
fm) T@+ BE (+ y) (bo +n—-y) DrD a +n ++ b) 


(4.1) 


Example 4.1 (Basmati Rice) Consider a case where 500 rice grains are 
examined and a total of 200 chalky grains are counted. 


> n=500 
> y=200 


Suppose that the prior distribution for the proportion 6; of chalky grains in 
traditional varieties can be centered at 0.51 with a standard deviation equal 
to 0.19, while the proportion 62 of chalky grains in non-traditional varieties 
can be centered at 0.39 with a standard deviation equal to 0.31. The prior 
parameters (œ;, 6;) can be elicited as in (1.38) and (1.39). 


> pul=O). 51 
> Sis. IE 
S MZ=0, 39 
> S2=0., 31 


We first write a function beta_prior that computes the prior parameters 
a; and f; according to (1.38) and (1.39). 


> beta_prior=function (m, v) { 
+ a=m>» (mx (1-m) /v-1) 

+ b=(1-m) +» (mx (1-m) /v-1) 

+ return(c(a,b))} 


The hyperparameters of the two beta distributions, say a1, 61, @2, and f2 can 
then be obtained straightforwardly as 


> abl=beta_prior(m1,s1~2) 
> ab2=beta_prior(m2,s2°2) 


The beta-binomial distribution can be calculated straightforwardly using 
the function dbbinom that is available in the package extraDistr 
(Wolodzko, 2020). 


> library (extraDistr) 

> BF=dbbinom(y,n,ab1[1],ab1[2])/dbbinom(y,n,ab2[1], 
+ ab2[2]) 

> BF 


M 2-009102 


The Bayes factor provides weak support for the hypothesis that the rice 
type is traditional rather than non-traditional. 
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4.2.2 Multinomial Model 


The physical and chemical analysis of gunshot residues (GSR) is a well-established 
field within forensic science. GSR are commonly analyzed to help with issues 
regarding the distance of firing and alleged activities of persons in incidents involv- 
ing the use of firearms. A study by Brozek-Mucha and Jankowicz (2001) focused 
on the use of GSR for discriminating between a selected number of case types 
(i.e., particular combinations of weapon and ammunition). The authors conducted 
experiments using six categories, each consisting of a specific combination of 
weapon and ammunition, called categories A to F. Note that the aim here is not 
to infer a particular weapon and ammunition as the source of recovered GSR 
of unknown source. The purpose is only to provide assistance in discriminating 
between well-defined case types (i.e., categories). 
Consider the following pair of competing propositions: 


Hı: The gunshot residue particles are of type D (Beretta pistol and 9 mm Luger 
ammunition). 

Hy: The gunshot residue particles are of type E (Margolin pistol with Sporting 
5.6 mm ammunition). 


Denote by 6; and 62; the proportion of particles in given chemical classes, 
j=1,...,k, characterizing categories D (i.e., category 1) and E (i.e., category 2). 
The number 71, ..., ng of particles pertaining to distinct chemical classes 1, ..., k, 
i.e., the chemical classes PbSbBa, PbSb, SbBa, Sb(Sn), Pb, and PbSnPb as specified 
in Brozek-Mucha and Jankowicz (2001), can be treated as realization from a multi- 
nomial distribution f(n1,... nk | Oil, ..-, Oik), i = 1,2. A conjugate Dirichlet 
prior probability distribution f(6;;,...,0;¢ | Oj1,..-, 7%) can be considered for 
modeling uncertainty about the proportions 6;;, i = 1, 2 (Sect. 3.2.2). 

The marginal distribution at the numerator and the denominator of the Bayes 
factor in (1.26) can be computed as in (1.25) and becomes 


k 
(aj) (n+ 1) T (nk + aij) 
fay (1, «+21 Me | Ois +s CR) = —S I] L, 
Pata) irera D 
where aj = Ye ay and n = Xin j- This is a Dirichlet-multinomial 
distribution with parameters n and aj1,..., Qj. 


From a decision-theoretic point of view, the questioned items can be classified in 
category D (decision d1) whenever 


L/h 


BF > 3 
11/12 


(4.2) 


where l; (l2) represents the loss incurred when decision dı (d2) is erroneous, and a 
“0 — 1;” loss function is chosen (Sect. 1.9 and Table 1.4), while 71/72 is the prior 
odds in favor of Hy. 
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It may be objected that the values for /; and l2 are difficult to assess. However, 
what really matters is the ratio k of the actual values, /} = k - l2. Note that this is 
an asymmetric loss function. In this way, starting from a prior odds equal to 1, the 
criterion in (4.2) may be rewritten as follows: 


BF > k. (4.3) 


Stated otherwise, whenever the competing hypotheses are considered equally 
probable, a priori, the decision dı will be optimal if BF > k, that is if wrongly 
deciding dı (i.e., H2 holds) is less than BF times worse than wrongly deciding d2 
(i.e., Hı holds). Clearly, the prior odds must not necessarily be equal to 1, and the 
criterion can be adapted accordingly. 


4.2.2.1 Choosing the Parameters of the Dirichlet Prior 


The problem of how to elicit a prior probability distribution about a proportion has 
been discussed in Sect. 1.10. In the type of case considered here, an analyst will face 
the problem of eliciting a prior opinion about a set of proportions, assuming that the 
subjective prior distribution is chosen from the family of Dirichlet distributions. 

There are various options for the hyperparameters aj1,..., jx, characterizing 
the prior probability distribution on the proportions 6;1, ..., Oig. One is the uniform 
prior probability distribution, with ajj = 1, j = 1,...,k. Whenever further 
information is available in terms of the number of outcomes in the distinct 
categories, €.g., Xj1,..., Xix, the hyperparameters a;; can be updated to aj; + xij. 

There are cases, however, where the analyst is able to specify a non-uniform 
prior probability distribution about the proportions. Following the methodology 
illustrated in Zapata-Vazquez et al. (2014), the prior probability distribution about 
a set of proportions 61, ..., Oig can be elicited using tools available in the package 
SHELF (Oakley, 2008). The user is only asked to provide a lower (e.g., 0.25), a 
median, and a upper (e.g., 0.75) quantile for the marginal densities of proportions 
that follow a beta distribution. Details will follow in the next example. The reader 
can also refer to O’ Hagan et al. (2006), where a practical example is provided. 


Example 4.2 (Gunshot Residue Particles) Consider a case in which a given 
number of particles (266) have been collected and analyzed by a scientist. The 
particles have been collected from a target surface (e.g., a person’s hands). The 
counts of gunshot residue particles are as follows: 


(continued) 
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Example 4.2 (continued) 


Total number Chemical classes 
of particles PbSbBa PbSb SbBa Sb(Sn) Pb _ PbSnPb 
266 18 36 2 150 38 22 


The scientist is asked to help discriminating between the following two 
propositions: 


Hı: The gunshot residue particles are of type D (Beretta pistol with Luger 
9 mm ammunition). 

H: The gunshot residue particles are of type E (Margolin pistol with 
Sporting 5.6 mm ammunition). 


One way to elicit the Dirichlet distribution in the case here is to use 
observed frequencies of particles in various chemical classes as reported in 
previous studies (e.g., Brozek-Mucha & Jankowicz, 2001). Suppose that the 
elicited expert judgments for the marginal proportions characterizing category 
D are as follows: 


Quartiles Chemical classes 
(%) PbSbBa PbSb SbBa Sb(Sn) Pb _ PbSnPb 
Lower 5.00 9.00 0.40 66 9.00 7.60 
Median 523 9.25 0.45 68 9.25 7.80 
Upper 5.50 9.50 0.50 70 9.50 8.00 


and those characterizing category E: 


Quartiles Chemical classes 
(%) PbSbBa PbSb SbBa Sb(Sn) Pb PbSnPb 
Lower PS 7.00 0.13 56 24 5.60 
Median 255 7.50 0.15 58 26 5.80 
Upper 2.75 8.00 0.17 60 28 6.00 


Consider, first, the elicitation of the Dirichlet distribution concerning the 
first population, Dir(011,..., 01k | @11,...,Q@1x). Starting from the given 
lower, median, and upper quartiles for each marginal proportion, the prior 
distribution can be elicited as follows. 


> P= (0-25; 05; O 75) 
> Enilse (5,5.25, 5.5) (100 


(continued) 
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Example 4.2 (continued) 
> laQ=e (9, 9.25, 9. 5) / 10 


th4=c (66,68, 70) /100 
ielid=e (9, 9.25, 9.5) /10 


> 
> 
> 
> thé=c(7.6,7.8,8)/100 


0 


c= (O,4,0.45, 0,5) (106 


0 
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The function fitdist, available in the package SHELF, allows one to fit a 
parametric distribution starting from the elicited probabilities. In the example 
here, the parameters of the elicited beta distribution for each proportion are of 


interest. 


library (SHELF) 

fiti=fitdist (vals = 
fit2=fitdist (vals = 
fit3=fitdist (vals = 
fit4=fitdist (vals = 
ime Om cOke (Wells = 
Ere6 = Ened SEVA SEE 


We Was WE WAV A A 


thi, 
th2, 
th3, 
th4, 
ths, 
thé, 


probs 
probs 
probs 
probs 
probs 
probs 


= D, 
= P., 
= P., 
= P., 
= P., 
= D, 


% A) 
y iy 
> 2) 
a) 
4 i) 
a) 


The last six objects contain the parameters of the beta distribution that is fitted 
for each marginal proportion. For example, the parameters a; and 6, of the 
elicited beta distribution of 6) (i.e., proportion of gunshot residue particles in 
category PbSbBa) can be obtained as 


> fitisBeta 


shapel shape2 
AL UPO.US0G SHAY 5 ily 


Next, fit the Dirichlet distribution to the elicited marginals by means of the 


function £itDirichlet that is available in the same package. 


S El wiie & RETE DI PCA OC (nei, wee, nies wiles jienieS, wii, 
+ categories = c("PbSbBa","PbSb", "SbBa", "Sb(Sn)", 
A Weg UAVS) n ieniciceel = Marin” 


Directly elicited beta marginal distributions: 


PbSbBa PbSb 
shapel 1.90e+02 5.65e+02 3 
shape2 3.43e+03 5.54e+03 8 
mean Do ZOSO Y.258=02 4 
sd 35 7iS=03 Sa 7103 7 
sum 3, GZS OS © METOS & 


SbBa 


owe Ol 
.06e+03 
.53e-03 
.46e-04 
.10e+03 


) 


Sb (Sn) 


168. 
UD. 
0 
0. 
248. 


0000 
3000 


- 6800 


0296 
0000 
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Example 4.2 (continued) 


shapel 
shape2 
mean 
sd 
sum 


Sum of 


Pb 
5, 658702 
5.54e+03 
) 25a=02 
Sr Les 0s 
©, LLS7OS3 


elicited 


PbSnPb 
-38e+02 
-54e+03 
-80e-02 
97e 05 
.18e+03 


@aNwAIN OD 
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marginal means: 1 


Beta marginal distributions from Dirichlet fit: 


shapel 
shape2 
mean 
sd 
sum 


sum 


The Dirichlet parameters a1, .. 


PbSbBa 
13.0000 
235.0000 
0.0526 
0.0142 
248.0000 
Pb 
22.9000 
225.0000 
070925 
0.0184 
248.0000 


PbSb 
22 o SOD 
2250000 
@ 0925 
0.0184 
248.0000 
PbSnPb 
ILO. ZOO) 
APS  OD)O) 
Oo OTAS) 
@ O17 
248.000 


NO NS SO 


1 and will be stored in a vector named al. 


= alae (ils, 24.9,i1,18, 168,222), 18.3) 


SbBa Sb (Sn) 
-12e+00 168.0000 
-46e+02 79.3000 
1538 08 0.6800 
726e 0) 0.0296 
.48e+02 248.0000 


., 24, can be read off from the row shape 


Parameter n of the Dirichlet prior is chosen by minimizing the sum of the beta 
parameters in each elicited marginal (input n . fitted set equal to min). See 
Oakley (2008) for more details. 

In the same way, the Dirichlet distribution concerning the second popula- 


tion, Dir(621,.. 


Ww WW OM OW M WE WA 


so Pox | Cilp: 


chil=e(2.,395,2.55,2. 75) /100 
th2=c(7,7.5,8)/100 
cMs=E(O,13,0,25,0,.27) /100 
th4=c (56,58,60) /100 
th5=c (24, 26, 23) #100 
thé=c(5.6,5.8,6)/100 
fielericcnige (vals = Emil, jomolos 
fit2=fitdist (vals 


th2, probs 


., 2%), can be elicited. 
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Example 4.2 (continued) 


S EAeISrueCwisie (vala = tins, jowoos = jo, OW, il) 

S fietawriicowise (wales = ichkvi, jowolos = jo, OW, Al) 

> Fiedsirvicckisit(wals = bas, ordos = jo, O, i) 

> Pnje6sirilicckisie (walls = a6, jowolos = ja, O, i) 

> ChLC = enieDirn enilee (VE, ELEA, nies, ELEA, EE Sp EEG; 
+ categories = c("PbSbBa","PbSb", "SbBa","Sb(Sn)", 

@ WploW . WASAN) ial. nece = iat) 

The Dirichlet parameters a1, ..., @2% can be read off analogously from the 


row shape 1 (not shown here) and will be stored in a vector named a2. 
> 2256 (5.59,16.,4,0.331, 127, 57,1257) 

The counts of gunshot residue particles are 

> m= (18, 36,2, 150, 38,22) 


The density of a Dirichlet-multinomial distribution can be calculated using 
the function ddirmnom that is available in the package extraDistr 
(Wolodzko, 2020), and the Bayes factor can be obtained straightforwardly 


> library (extraDistr) 
> BF=ddirmnom(n,sum(n),a1)/ddirmnom(n,sum(n), a2) 
> BF 


[1] 658.6326 


The Bayes factor provides moderately strong support for the hypothesis 
that the gunshot residue particles originate from a Beretta pistol with Luger 
9mm ammunition rather than from a Margolin pistol with Sporting 5.6 mm 
ammunition. 

Assume mı = 72 = 1. If a “O — l;” loss function is introduced, then 
decision dı, classifying the gunshot residue particles into category D, is to 
be preferred to the alternative decision d2 unless wrongly deciding d; is felt 
more than 659 times worse than classifying the particles in category E. 


Note that by choosing a “0 — 1” loss function, or a symmetric “0 — l;” loss 
function with /; = l2, a BF greater than 1 (or, more generally, greater than 22/7 
for unequal prior probabilities) provides a criterion for addressing the classification 
problem. The aim here was to show that when assuming equal prior probabilities 
for the hypotheses being compared, then, for the decision d2 to be optimal, it is 
not sufficient to have an asymmetric loss function that assigns a loss to the adverse 
consequence of decision dı that is greater than the loss assigned to the adverse 
consequence of decision d2. Specifically, this loss must be roughly 659 times greater. 
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4.2.2.2 More than Two Populations 


Consider now the case where more than two weapons (and related ammunitions) 
could be at the origin of the collected gunshot particles. Suppose that a third weapon 
is taken into consideration and that the competing propositions are specified as 
follows: 


Hı: The gunshot residue particles are of type D (Beretta pistol with Luger 9 mm 
ammunition; population p1). 

Hy: The gunshot residue particles are of type E (Margolin pistol with Sporting 
5.6 mm ammunition; population p2). 

H3: The gunshot residue particles are of type F (TT-33 pistol with Tokarev 
7.62 mm ammunition; population p3). 


As discussed in Sect. 1.6, the expert may calculate the marginal likelihood fy, (y) 
(i.e., a Dirichlet-multinomial distribution) for each proposition and report a scaled 
version as in (1.27), that is, 


fa) 


moea O 
"a S aao 


or the posterior probabilities 


Pr(H;) fj, O) 


Pr(HĦ; | y) = , 
Xj- PrE) fh O) 


Alternatively, the analyst may also consider the possibility of summarizing proposi- 
tions H? and H3 into one as Hy = Hz U H3. A pair of competing propositions may 
thus be formulated as follows: 


Hı: The gunshot residue particles are of type D (Beretta pistol with Luger 9 mm 
ammunition; population p1). 

Hı: The gunshot residue particles are of type E (Margolin pistol with Sporting 
5.6mm ammunition; population p2) or of type F (TT-33 pistol with Tokarev 


7.62 mm ammunition; population p3). 


The Bayes factor can be obtained as in (1.28), that is, 


_ f(y) Dia Pr(pi) 
fao) 


BF 


; (4.4) 


where 


3 
Fino) =OP f. FO 10: | pas. 
i=2 Zi 
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Example 4.3 (Gunshot Residue Particles—Continued) Recall Example 4.2, 
and suppose that the elicited expert judgments for the marginal propositions 
characterizing category F are as follows: 


Quartiles Chemical classes 
(%) PbSbBa PbSb SbBa Sb(Sn) Pb PbSnPb 
Lower 6.00 4.50 3.00 65 14.0 3.00 
Median 6.15 4.75 3.25 67 14.5 3.25 
Upper 6.30 5.00 3.50 69 15.0 3.50 


The Dirichlet distribution concerning this new combination of 
weapon/ammunition can be elicited as before: 


> tehise(6,6.15,6.30) /106 

> th2=c(4.5,4.75,5)/100 

> th3=c(3,3.25,3.5)/100 

> th4=c(65,67,69) /100 

> thse (14, 14. 5, 15) /i100 

> EMSC (3,3.25, 3. ee 

S fdelsriuicowisic (vals = cin, joxolos = jo, OW, il) 

> EneGairiiechisie (walls = Ee, jowolos = ja, O, i) 

> Fiegsirvicckisit (wales = Eas, jowolos = jo, O, i) 

S FICA- fFicdisel(vals = icv, Probs = jo, OW, Al) 

> Fiedsiriviechisict(wals = bias, jorolos = ja, O; i) 

> Fie6siriliechisic (vals = C6, jowolos = ja, O, i) 

> Glue S IE EID NIAE EE GENIEN ,$LED, LES , EIE , PUES, FLIE6 , 
+ categories = c("PbSbBa","PbSb", "SbBa","Sb(Sn)", 

+ "Pb", "PbSnPb"),n.fitted = "min") 

The Dirichlet parameters a31, ..., @3, can be read off from the row shape 


1 (not shown here) and will be stored in a vector named a3. 
> @3=@(15.7,12.1,8.2829,170;, 36.9, 8,29) 
The scaled version of the marginal likelihoods can be easily obtained as 


fhi=ddirmnom(n,sum(n),al1) 
fh2=ddirmnom(n,sum(n),a2) 
fh3=ddirmnom(n, sum(n),a3) 
fhiscaled=fhi/ (fh1i+fh2+fh3) 
fh2scaled=fh2/ (fh1i+fh2+fh3) 


Ww WW yV OW 


(continued) 
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Example 4.3 (continued) 
> fh3scaled=fh3/ (fhi+fh2+fh3) 
> c(fhiscaled, fh2scaled, fh3scaled) 


[1] 0.9980356379 0.0015153146 0.0004490475 


Note that the scaled likelihoods f Hi, (y) are equivalent to the posterior proba- 
bilities Pr(H; | y) whenever the prior probabilities of the three propositions 
are equal. 

Alternatively, suppose that propositions H2 and H3 are summarized as 
above, i.e., Hı = Hh U H3, and that the prior probabilities of Hı and A, 
are equal, so that Pr(H1) = 0.5 and Pr(H2) = Pr(H3) = 0.25. 


> p2=0.25 
> p3=0.25 


The Bayes factor can then be obtained as 


fhi=ddirmnom(n, sum(n),al1) 
fh2=p2*ddirmnom(n,sum(n),a2)+p3* 
ddirmnom(n, sum(n),a3) 

BF=fh1+ (p2+p3) /fh2 

BF 


WOW Se WA AW 


EA MOr WAZ 


4.3 Continuous Data 


The previous section considered the evaluation of scientific evidence in the form 
of discrete data for investigative purposes. However, for many types of scientific 
evidence, measurements lead to continuous data. In this section, we discuss 
parametric and non-parametric models for continuous data. 


4.3.1 Normal Model and Known Variance 


Suppose that tablets of unknown source are seized, and the question is whether they 
belong to population A or population B, which differ in color dye concentration. 
The propositions of interest are as follows: 


Hı: The seized tablets come from population A. 
Hy: The seized tablets come from population B. 
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The measurement of color dye concentration leads to continuous data for which 
a normal distribution is considered appropriate, say Xa ~ N(04, o2) for population 
A and Xg ~ N(6z, ož) for population B. Suppose that the variance of color dye 
concentration in the different populations is known. For the population means, a 
conjugate prior normal distribution is introduced, i.e., 04 ~ N(m4, t4) and 0g ~ 
N(B, TB). 

The analysis of a tablet of unknown origin yields the measurement y. The Bayes 
factor can be obtained as in (1.26), where the marginal likelihoods fy,(y) are still 
normal with mean equal to the prior mean jz and variance equal to the sum of the 
prior variance t° and the population variance o°, fu) =N(u, t? +0’). 

Whenever several measurements (y1, ..., Yn) are available, it is sufficient to 
recall that the joint likelihood is proportional to the likelihood of the sample mean 
y, which is normally distributed, Y~N (0, o7/ n), and that the marginal likelihood 
in correspondence of the sample mean y becomes fy, (Y) = N(p, t? +07/n). 


Example 4.4 (Color Dye Concentration in Ecstasy Tablets) A tablet of 
unknown origin is analyzed, and the measured color dye concentration is 
0.16 (measurements are in %). A prior probability distribution is elicited for 
the mean of population A, as 04 ~ N(0.14, 0.0037), and for the mean of 
population B, as 6g ~ N(0.3, 0.0167). The population variances ož and 
oe are assumed to be known and equal to 0.017 and 0.067, respectively 
(Goldmann et al., 2004). 


y=0.160 
pma=0.14 
pva=0.003~2 
pmb=0.3 
pvb=0.016*2 
sigmaa=0.01~2 
sigmab=0.06°2 


vvvvvyvVv 


The Bayes factor in (1.26) can be obtained straightforwardly as the ratio of 
two normal likelihoods evaluated for the available measurement of color dye 
concentration y. 


> BF=dnorm(y,pma, sqrt (pvat+tsigmaa) )/ 
+ dnorm(y,pmb, sqrt (pvb+sigmab) ) 
ZEBE 


[il] 2. 05706 


The Bayes factor provides moderate support for the proposition according 
to which the analyzed tablet comes from population A, rather than the 
proposition according to which the tablet comes from population B. Note 


(continued) 
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Example 4.4 (continued) 
again that this result does not mean that proposition Hj is more probable 
than proposition H2. It solely means that the probability to observe the 
concentration y is roughly 12 times greater if the tablet originates from 
population A rather than from population B. The posterior odds might be in 
favor of proposition H2 even in the presence of a Bayes factor greater than 1, 
if the prior probability of proposition Hj is sufficiently small. In the case at 
hand, it can be easily verified that the prior probability of proposition Hı 
needs to be smaller than 0.07 in order for the posterior odds to be in favor of 
A. 

Suppose now that n = 5 tablets are available, and the color dye concentra- 
tion measurements are y = (0.155, 0.160, 0.165, 0.161, 0.159). The value of 
the evidence can then be computed for the sample mean 


Vee (0.155, 0.160,0.165, 0.161, 0,152) 
n=length (y) 

num=dnorm (mean (y) , pma, sqrt (pvat+sigmaa/n) ) 
den=dnorm (mean (y) , pmb, sqrt (pvb+sigmab/n) ) 
BF=num/den 

BF 


vvvv VV 


[il] 134.628) 


The Bayes factor now provides moderately strong support for the proposition 
Hı, compared to proposition Hz. This is a direct effect of the increased 
number of measurements. 


4.3.2 Normal Model and Unknown Variance 


In some applications, both parameters are unknown, and a prior distribution for the 
population mean and the population variance must be introduced. A non-informative 
or a subjective prior distribution may be chosen, as mentioned previously in 
Sect. 3.3.2. 

Consider a case where skeletal remains are analyzed, and the question is whether 
they belong to a man or a woman. The competing propositions are as follows: 


Hı: The skeletal remains belong to a woman. 
Hy: The skeletal remains belong to a man. 


The study of Benazzi et al. (2009) found that the measurement of the sacral base 
is a useful indicator of sex. 

Consider a normal probability distribution for the area of the sacral base Xp ~ 
N(0F, ož) for the population of females, and Xm ~ N(@y, ou) for the population 
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of males. A conjugate prior probability distribution f (6;, o?) can be assumed for 
(0i, o?) as in (3.12), where (6; | 07) ~ N(ui, o?/ni) and o? ~ Si - x~7(ki), 
i = {F, M}. This amounts to an inverse gamma distribution with shape parameter 
aj = ki /2 and scale parameter 6; = S;/2, o? ~ IG(ki/2, Si /2). 

The marginal density needed to compute the BF, fp; (-), is a Student t distribution 
with k; degrees of freedom, centered at u;, with spread parameter, denoted here spi, 
equal to 


ni 
ni+ 1 


Spi = dib; l 
(as noted previously in Sect. 3.3.2). Note that in this case there is one available 
measure (ny = 1). 


Example 4.5 (Sex Discrimination for Skeletal Remains) The sacral base of a 
skeletal remain is measured and found to be 11.5 cm?°. The prior probability 
distribution for (8p, a) as illustrated in Sect. 3.3.2, is elicited based on the 
following population data: 


Population Females Males 
Number of individuals 38 35 
Sample mean (cm?) 10.35 14.09 
Std dev (cm?) 1.42 152 


The prior distribution for (6 | oż) and (Oy | o4,) can be centered at wp = 
10.35 and um = 14.09, respectively, with np = 38 and ny = 35. 


man NORS 
miez E 
mum=14.09 
nm=35 


vvvyv 


The prior distribution for ož and oĉ, can be elicited using the parameter value 
k = 20 (as in Example 3.6) and choosing Sp and Sm such that 


Pr(o2 > 1.427) = Pr(oĝ > 1.527) = 0.5 


k=20 
sigmaf=1.42°2 
sigmam=1.52~2 
q=qchisq(0.5,k) 


vvvyv 


(continued) 
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Example 4.5 (continued) 
> Sf=qxsigmaf 

> Sm=qxsigmam 

= (SE Sim) 


[2] 38. 99199 44.67 720 


The prior distributions for ož and oh are 39 - y~*(20) and 45 - x~2(20), 
respectively. The marginal density in the numerator of the Bayes factor is a 
Student t distribution with kr degrees of freedom, centered at wr = 10.35 
with spread parameter spr = 0.5 (rounded at the second decimal). 


> spf=nf/ (nf+1) *k/Sf 


The marginal density in the denominator of the Bayes factor is a Student t 
distribution with km degrees of freedom, centered at ym = 14.09 with spread 
parameter sy = 0.44 (rounded at the second decimal). 


> spm=nm/ (nm+1) *k/Sm 


Note that in this case kp = ky = k. 

The density of a non-central Student t distributed random variable 
can be calculated using the function dstp available in the package 
LaplacesDemon (Hall et al., 2020). The Bayes factor can be obtained as 
follows: 


library (LaplacesDemon) 

y=11.5 

BF=dstp (y, muf, spf, k) /dstp (y, mum, spm, k) 
BF 


WWE OND WE 


[ij] 3.104994 


This value provides weak support for the proposition according to which the 
skeletal remains belong to a woman rather than a man. 


4.3.3 Non-Normal Model 


As pointed out in Sect. 3.4.1.2, certain types of observations lack sufficient regular- 
ity to apply standard parametric models. 

Consider a case where banknotes are seized on an individual following an arrest. 
A question commonly asked in such a case is whether the seized banknotes come 
from a population of banknotes used in drug dealing activities. The following 
propositions may thus be formulated: 
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Fig. 4.1 Drug intensity measured on banknotes of 200 euro in a population of banknotes from 
drug trafficking (left) and general circulation (right) (Besson, 2004) 


Hı: The seized banknotes have been used in illegal drug dealing activities 


(population pı). 
H: The seized banknotes are from general circulation (population p2). 


Figure 4.1 shows histograms of drug intensities measured on banknotes from 
drug trafficking (left) and general circulation (right). It can immediately be observed 
that the distributions for the two populations are different, that the distribution 
related to banknotes involved in drug trafficking is not unimodal, and that the one 
for banknotes in general circulation is positively skewed (Besson, 2004). 

Suppose a database is available {zy = (z11, . - . , Zim), L = 1, 2}. The probability 
distribution for population p;, f/(-), can be estimated by means of kernel density 
estimation AiG) as 


mı 


ÅO | 2115 Zim) = zy YOK | zi M), (4.5) 
i=1 


where K(y | zzi, 47) is taken to be a normal distribution centered at z;; with variance 
equal to h?s?, s? = X” Gu — Z1)?/(m — 1), and z; = Yo", zi /mi. 

The estimate AQ) of the probability density is obtained by adding individual 
densities over all observations in the database and then dividing by the sum of the 
observations. 

Figure 4.2 shows the kernel density estimates A (y | Z11,-++5 Z1m,) and Aho | 
Z21, - - -, Z2m ) Obtained using (4.5) with the smoothing parameter set equal to 0.15 
for both populations. It can be observed that kernel density estimates are more 
sensitive to multimodality and skewness and provide a better representation of the 
available data. 
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Fig. 4.2 Drug intensity measured on banknotes of 200 euro in a population of banknotes from 
drug trafficking (left) and general circulation (right), and associated kernel density estimates with 
smoothing parameter h equal to 0.15 


n, 


Starting from the available measurements y = (31,..., Yn) on a sample of size 
a Bayes factor can be obtained as 


F- fao ia FiO | Zit, ---, Zim) 
fm) JER Oi lza., Zam) 


(4.6) 


Example 4.6 (Contaminated Banknotes) Consider a case in 
which 8 banknotes are seized on a person of interest. Laboratory 
analyses of the banknotes reveal drug intensities [du] equal to 
y = (322, 158, 114, 125, 361, 801, 798, 135). A database named 
banknotes.Rdata is available on the book’s website. It contains 
sample data for drug intensities on banknotes from drug trafficking and 
general circulation (Fig. 4.1). Note that these are hypothetical data used for 
the sole purpose of illustration. The (nı x 1) vector of measurements on 
banknotes from drug trafficking is extracted and denoted pop1; analogously, 
the (n2 x 1) vector of measurements on banknotes from general circulation 
is extracted and denoted pop2. 


> load('/.../banknotes.Rdata') 
> pol=bancnotes[[1]] 
> pop2=bancnotes[ [2] ] 


(continued) 
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Example 4.6 (continued) 

The smoothing parameters hı and h2 are set equal to 0.15. The variances 
of drug concentration from each population, Si and S> are estimated by the 
sample variance 


n=), 15 
n205 
sl=var (pop1) 
s2=var (pop2) 


vvvyv 


The kernel density estimation in (4.5) for the numerator and the denominator 
is computed by means of the functions kn1 and kn2, respectively. 


ni=length (pop1) 
n2=length (pop2) 
ski=hiz«sqrt (s1) 
sk2=h2«sqrt (s2) 
kni=function (x) {sum(dnorm(x,pop1,sk1)) /n1} 
kn2=function (x) {sum(dnorm(x, pop2,sk2)) /n2} 


vvvvvVv 


The estimated probability densities are represented in Fig. 4.2. 


x=matrix(seq(0,1100,1),nrow=1) 
flh=apply (x,2,kn1) 

f£2h=apply (x,2,kn2) 

par (mf£row=c (1,2)) 

hist (popl, freq=F) 

lines (filh, type='1') 

hist (pop2, freq=F) 

lines (£2h, type='1') 


Ww WM Ww MW Wy WA 


Consider now the vector of measurements y. The probability densities are 
estimated as in (4.5): 


> veneers (6 (222, 158,114, 125, 361, B01, 798, 135) , meer) 
S til=ejojoly (y 2, baad) 
S BA=ejoj oly (V, 2, haa) 


and the Bayes factor is obtained as in (4.6): 


> BF=prod (f1)/prod(f2) 
> BF 


[lL] 29s 767 


The Bayes factor represents moderate support for the proposition according 
to which the seized banknotes have been used in illegal drug trafficking 
rather than the proposition according to which they are part of the general 
circulation. 
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Sensitivity to the Choice of the Smoothing Parameter 


The sensitivity of the BF to the choice of the smoothing parameter may be a cause of 
concern, as different choices may be made. The smoothing parameter h determines 
the shape of the estimated probability density: if it is (too) large, the curve Ô (y) 
will be (very) smooth; on the other side, if it is (too) small, the resulting curve will 
be more spiky. Figure 4.3 shows, for both populations, the density curves obtained 
with h = 0.1 (dotted line), h = 0.15 (solid line), A = 0.2 (dashed line), h = 0.25 
(dot-dashed line). The Bayes factor for the available measurements in Example 4.6 
is then calculated for several choices of the smoothing parameter A. 


> hsens=c(0.1,0.15,0.2,0.25) 

> BFsens=rep (0, length (hsens) ) 

> for (i in 1:length(hsens) ) { 

+ Ski=hsens [i] «sqrt (s1) 

+ sk2=hsens [i] «sqrt (s2) 

+ fl=apply(y,2,kn1) 

+ f2=apply(y,2,kn2) 

+ BFsens [i] =prod(f1)/prod(f2) } 

> round (BFsens, 2) 

[1] 1402.94 29.72 5.63 2.00 


Note that the last two values correspond to large values of the smoothing 
parameter h, providing a very smooth curve. 


4.4 Multivariate Data 


As mentioned in Sect. 3.4, analysts frequently encounter multivariate data because 
the features of examined items and materials, such as handwritten or printed 
documents, glass fragments, or skeletal remains, can be described by more than 
one variable. Such data often present a complex dependence structure with a large 
number of variables and multiple levels of variation. 


4.4.1 Normal Multivariate Data 


The classification of skeletal remains on the basis of sexual dimorphism is a 
common problem in paleontology. Section 4.3.2 dealt with the question of how to 
quantify the evidential value of measurements of a given morphological trait (e.g., 
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Fig. 4.3 Sample data used in Example 4.6 regarding drug intensities on banknotes for a 
population of banknotes from drug trafficking (top) and in general circulation (bottom), and 
associated kernel density estimates with smoothing parameter h equal to 0.1 (dashed line), 0.15 
(solid line), 0.2 (dotted line), and 0.25 (dot-dashed line) 
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the profile of the sacral base). A number of studies have documented sex differences 
in particular pelvic traits, such as the obturator foramen, that tend to be oval in males 
and triangular in females. The shape of these traits can be described quantitatively 
by Fourier descriptors following the image analysis procedure developed by Bierry 
et al. (2010). Each item can be described by means of several variables, i.e., the 
amplitude and the phase of the first three harmonics. 

Suppose that observations are available from a p-dimensional multivariate 
normal distribution whose mean vector and variance—covariance matrix are 0; and 
Wi), respectively, Z; ~ N(01, W1), L = 1, 2 (where / = 1 stands for the population 
of females and / = 2 for the population of males). Suppose further that the prior 
distribution about (01, W;) is chosen in the conjugate family of the normal-inverse 
Wishart distribution NIW (21, vi, Mi, cp) 


z Cl D 1 E 
f (0i, Wr) ox| Wy [7 rte +22 exp | —S61— mi) W, ‘61 m) = 50W; 'an}, 


where jt; is the center vector, c; are the degrees of freedom associated with the center 
vector Mı, $2) is the dispersion matrix, and w; are the degrees of freedom associated 
with the dispersion matrix §2; (O’ Hagan & Kendall, 1994). 

Consider now a case where skeletal remains are recovered, and the following 
propositions are of interest: 


Hı: The skeletal remains belong to a woman (i.e., a member of population p1). 
Hy: The skeletal remains belong to a man (i.e., a member of population p2). 


Denote by y = (y1, ..., Yp) the measurements (i.e., Fourier descriptors) related 
to the item whose origin is unknown and that needs to be classified. The marginal 
distribution under the competing propositions Hı and H2, fy,(y) for! = 1,2, can 
be obtained as 


FY mr er 2.) = f _ £18 WIL, Wd, W) 
L, WI 


= —(vı+1)/2 
j cj +l 
x [itom | 7 a| vm] (4.7) 


This is a p-dimensional Student t distribution with 6; = v; + 1 — p degrees of 
freedom, location m;, and scale matrix 


(cr + IR) 
o (a) 


l Note that a conjugate prior distribution may not always be the best choice. A method for 
assessing a non-conjugate prior distribution where the vector mean and the covariance matrix of 
the multivariate normal are, a priori, independent is provided by Garthwaite and Al-Awadhi (2001). 
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The Bayes factor can be obtained as 


— FY | ey, ci, 21, vi) 


BF = s 
fY | Ha, c2, 22, v2) 


4.4.1.1 Prior Distribution for the Unknown Mean and Variance 


Four parameters must be elicited. The elicitation of m; is rather simple. Since p, 
represents the mean, the median, and the mode of the prior probability distribution, 
the analyst may assess any of these summaries (O’ Hagan et al., 2006). A procedure 
for the elicitation of the degrees of freedom c and v and the dispersion matrix 2 has 
been provided by Al-Awadhi and Garthwaite (1998). 

Here, suppose a non-informative prior distribution is used: 


f (01, Wi) x| Wi TPt. 


A database is available, with nı measurements for the population of females (p1) 
and n2 measurements for the population of males (p2). The corresponding posterior 
distributions (one for the numerator, one for the denominator) can be written as 


(0; | 21, ED ~ NG, 21/11) (4.8) 
(27 | z) ~ IWS), nı — 1), (4.9) 


where S; = ee {Zi — 2) (Zi — z;)’ is the sum of the squares about the sample 
mean and z = ie Zıj/nı. 

The marginal likelihood fy, (y) is, therefore, a p-dimensional Student t distribu- 
tion with n; — p degrees of freedom, location vector z;, and scale matrix 


pa MEUN (4.10) 
nı(nı — p) 


so that (y | 2, Fi, nı — P) ~ tn-p(Z1, Fi). 


Example 4.7 (Sex Discrimination for Skeletal Remains Using Multivariate 
Data) Skeletal remains are recovered, and the obturator foramen area is 
measured. The measurements of the first three pairs of Fourier descriptors 
are as follows: 


(continued) 
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Example 4.7 (continued) 


First harmonic Amplitude 0.083095 
Phase 2.6527709 

Second harmonic Amplitude 0.932333 
Phase 0.4530559 

Third harmonic Amplitude 0.413736 
Phase 0.3174581 


Suppose that two databases of dimensions (nı x p) = (51 x 6) and 


(n2 x p) = (50 x 6) are available for the population of women and men, 
respectively. These two databases can be used to obtain the summaries Z1, Z2 


G. 


e., the location vectors) and $1, S2 (i.e., the sum of the squares about the 


sample means) that are needed to calculate the marginal probability densities 
of the available measurements under the competing propositions. The location 
vectors Z; and z2 and the sum of the squares about the sample means Sı and 
S2 can be obtained straightforwardly as 


> 
> 


as.matrix(colMeans (population) ) 
cov (population) «(n-1) 


where population is a database of dimension (n x p) containing the 
available data. Note that only summaries Z1, Z2, S1, S2, as well as the vector 
of measurements y are available in the database skeletal .Rdata and can 
be obtained as 


> load('skeletal.Rdata') 
eu Ya 
Al Phil A2 Phi2 A3 
0 C2 20950 2, 6527708. 0 SS23330 O41 530559 O41 0 
Phi3 
0.3174581 
> cbind(m1,m2) 
fp abl Tn2l 
Al 0.07500563 0.05078316 
Phil 2.60792515 3.37739963 
A2 1.08366494 1.15684192 
Phi2 0.17014670 0.08233948 
A3 0.50490100 0.39364526 
Phi3 0.34169629 0.39422141 
SS 


(continued) 


4.4 Multivariate Data 165 


Example 4.7 (continued) 
Al Phil A2 


Al 0.09264018 0.3596888 0.16429701 
Pitt O,SS96E050 27 TOVSOLS 1,5 7099OL9 
A2 On, WGA2970i I SVOIYLO2 2, 34509053 
Phi2 0.12045387 0.3766871 -0.03091293 
A3 QO ,02Z1OS56 -O. 703721 =O. 2890817 
Phi3 -0.04738129 0.7762414 -0.36724194 

Phi2 A3 Phi3 
Al O,U2ACS287 O 02310556 —0 047/329) 
Phil 0.37668708 -0.70372110 0.77624136 
A2 =9 020912493 -©,.2890S3L17 -0. 36 720io4 
Phi2 0.36278820 -0.02996462 -0.04588018 
A3 =0 02996462 OO. 58676167 0.545 21185 
Phi3 -0.04588018 0.31452185 0.53788595 
= 62 

Al Phil A2 

Al 0.059683655 0.41066454 -0.02342685 
Phil 0.410664544 138.15898708 2.29687413 
A2 -0.023426848 2m2 68 pais i SIAS PASS) 
Phi2 0.049798218 -1.91573412 -0.02475354 
A3 0.082509024 Oo O19 siete =—@. dilSg gsi 
Pinks On 007252672 2 AISI NOSS =0).3AVSAINS 

Phi2 A3 Phi3 
Al 0.04979822 0.08250902 0.007252672 
Phil -1.91573412 0.01934154 2.475336335 
A2 =0 024752354 -@, 31599891, -0. 337547756 
Phi2 0.25584612 0.12310366 -0.149047658 
A3 O,1IASLO266 O,S9S61567 0 ,.AasiS 5 ogi 


Phi3 -0.14904766 0.22515583 0.608557392 


The marginal density 7, (y) in the numerator of the Bayes factor is a p- 
dimensional Student t distribution with nj — p = 45 degrees of freedom, 
location m1 as above, and scale matrix 


S ial 5 Al 

> p=6 

> F1=S1*(n1+1)/(n1*(n1-p) ) 

The marginal density f7,(y) in the denominator of the Bayes factor is a p- 


dimensional Student t distribution with n2 — p = 44 degrees of freedom, 
location m2 as above, and scale matrix 
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Example 4.7 (continued) 
> MP=50 
> F2=S2x(n2+1)/(n2x(n2-p) ) 


The density of a multivariate Student t distributed random variable can be cal- 
culated using the function dmvt available in the package LaplacesDemon 
(Hall et al., 2020). 


library (LaplacesDemon) 

num=dmvt (y, t (m1), F1,n1-p, log=FALSE) 
den=dmvt (y, t (m2), F2,n2-p, log=FALSE) 
num/den 


MOV M NY) 


[iL] 1545 439) 


The Bayes factor represents strong support for the proposition according to 
which the skeletal remains originate from a woman (population pı) rather 
than from a man (population p2). 


As discussed in Sect.3.4.2, it is important to study the performance of the 
proposed model. This can be achieved by using the available databases to generate 
many test cases and computing relevant performance metrics. 


4.4.1.2 Classification as a Decision 


The BF obtained in Example 4.7 supports proposition Hı over H2. However, if a 
decision is to be made, one needs to take into account the prior uncertainty (in terms 
of probabilities) about the competing propositions and the undesirability (in terms 
of losses) of adverse outcomes (i.e., classification errors). 

Let xı and z2 denote the prior probabilities of propositions Hı and H2. The 
posterior probabilities a; and a2 can be easily calculated as 


mı f(y | My, cr, 21, vi) 
2 oy 
Vii Tif | Mj, cj, Rj, vj) 


Qa] = 


where the marginals f(y | Mj, Cj 2j, vj), l= 1, 2, are as in (4.7). 

A criterion that can be used to classify the recovered item into one of the two 
populations has been outlined in Sect. 1.9. When using a “0 — l;” loss function 
(Table 1.4), the Bayes decision criterion states that the decision d4, classifying the 
recovered item in the population of females (p1), is optimal whenever 


L/l 
BF > E ze, (4.11) 
1/12 
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Example 4.8 (Sex Discrimination for Skeletal Remains Using Multivariate 
Data—Continued) If the prior odds are 1, and a symmetric loss function 
is chosen (i.e., /; = l2), the criterion in (4.11) says that the decision dı is 
optimal whenever BF > 1. 

Assuming equal prior probabilities may be unrealistic because, often, 
there is at least some information to help assert whether one proposition is 
more probable than the stated alternative proposition. Likewise, the decision 
maker’s preferences among adverse outcomes may not properly be reflected 
by a symmetric loss function, though it should be noted that what actually 
matters is only the ratio of l to l2. 

To investigate the effect of alternative choices for the prior odds and the 
loss function, one can conduct a sensitivity analysis. Figure 4.4 shows an 
example for the threshold c in (4.11) as a function of increasing values of 
the prior probability 7; and for different asymmetric loss functions, where l2, 
the loss associated with the adverse outcome of the decision dh, is fixed at 1, 
and /;, associated with the adverse outcome of the decision d4, is equal to 10, 
50, and 100. 

This analysis reveals that dı is not the optimal decision for very high values 
of lı, compared to l2, and for very small values of the prior probability 71. 


Fig. 4.4 Threshold c, the BF 


necessary to ensure that the — hte = 1 o0 
decision dı has the smaller Il =10 


expected loss than the 
decision d2, as specified by 
Eq. (4.11), as a function of 
the prior probability 7, and 
for different loss ratios 1; / l2) 


1500 


1000 


500 
fi 


Ta 
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4.4.2 Two-Level Models 


A recurrent problem in forensic practice is to help distinguish between legal and 
illegal cannabis plants (Bozza et al., 2014). Cannabis seedlings can be discriminated, 
to some extent, on the basis of their chemical profiles using chemometric tools and 
a methodology as described in Broséus et al. (2010). This study focused on several 
target compounds, taking into account their presence in drug type (illegal) and fiber 
type (legal) Cannabis. 

Suppose a dataset is available that consists of replicate measurements (n) made 
on illegal plants (population pı) and on fiber type plants (population p2). The 
sample size is equal to mı and m2 for populations pı and p2, respectively. 
Background data can be denoted by 2); = (Ziij1,.-., Zlijp), Where l = 1,2, i = 
1,...,m, j = 1,...,n, and p is the number of variables. Available data suggest 
that a statistical model with two levels of variation is suitable: variation between 
replicate measurements from the same source and variation between measurements 
from different sources. 


4.4.2.1 Normal Distribution for the Between-Source Variation 


Here we use the two-level random effect model described in Sect.3.4.1.1. For 
the within-source variation, the distribution of Z);; is taken to be normal, Zj;; ~ 
N(6;;, Wi). For the between-source variation, denote the mean vector between 
sources by 4;, and the matrix of between-source variances and covariances by B}. 
The distribution of 07; is taken to be normal, 6); ~ N(;, Bi). 

Measurements are available on some seized material, denoted by y = 
(yi,.--,Yn), where yj; = (Yj1,.--,Yjp), J = 1,...,n. A laboratory is asked 
to help determine the plant’s chemotype. The following propositions may be of 
interest: 


Hı: The seized plant is drug type Cannabis (population pı). 
Hy: The seized plant is fiber type Cannabis (population p2). 


The probability distribution of the measurements on items from each population 
is taken to be normal, Y ~ N(6), Bı), L = 1, 2. The marginal probability densities in 
the numerator and denominator have the form fa, (y) = fi(y | Mı, Wi, Bi), l = 1, 2, 
and can be obtained as in (3.28) 


fi | ey, Wi, Bi) =| 20 Wi ("| 27 Bi [71] 2, + Bey! |? 


1 
x exp {-5 [= wo + BN = m) +r (sw-")]| . 4.12) 


where S = )°y_1 (yi -DY — J)’. 
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The Bayes factor can then be obtained as in (1.26) as a ratio between the two 
marginals 


= fu) = fiy l; hi, Wi, Bı) 


BF = = 
fm) fQ | M2, W2, B2) 


Nie 


"Ae (awr +B’) | 


W: B = Jal 
|W2| |B2| i(nwz' +B") | 


2 1 j -1 
x exp [Den [rew E (¥ - hi) (nw, + Bi) (y — }} : 
i=l 
(4.13) 


The overall means pı and flo, the within-source covariance matrices W; and W2, 
and the between-source covariance matrices B; and B2 can be estimated from the 
available background data using (3.32), (3.33), and (3.34). 


Example 4.9 (Cannabis Seedlings) A plant of unknown type is analyzed, and 
the chemical profile is extracted. Three replicate measurements are taken (n = 
3) on three variables (p = 3): Cannabidiol (CBD), D9-Tetrahydrocannabinol 
(THC), and Cannabinol (CBN). Measurements on the item of unknown type 
are as follows: 


CBD THC CBN 
—1.3040 0.2310 0.6874 
—1.2918 0.2400 0.7350 
—1.0719 0.3176 0.9113 


> y=matrix(c(-1.304,0.231,0.6874, -1.2918,0.24,0.735, 
# si, O7L0,0. 3176, 0, DLI , ONES , loyacoria 1) 


The mean vectors between sources w, the within-source covariance matrices 
W, and the between-source covariance matrices B can be estimated from the 
available background data (Bozza et al., 2014). 

The estimates of the overall means p, and p, of the within-source covari- 
ance matrices W; and W2 and of the between-source covariance matrices B1 
and B are available in the database plant .Rdata and can be obtained as 
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Example 4.9 (continued) 
> load('plant.Rdata') 


> mul 

CBD Ee CBN 
[1,] -0.4566709 0.9728053 0.9196972 
> mu2 

CBD THC CBN 
[1,] 0.4097014 -0.7850832 -0.7592971 
> W1 

CBD THE CBN 


CEDROMOIOSSI26 
TC 0. 0OLSVEVS 
CBN 0.01038024 


0.015787374 
0.015708590 
0.005226694 


0.010380235 
0.005226694 
0.094354823 


> W2 


MAC 
I. 901700- 03 
5.685754e-04 
7.930402e-05 


CBN 
>39 6992128504 
7.930402e-05 
LL SVB92Z4e=02 


CBD 
0.0180694402 
0.0019017082 

-0.0003699212 


CBD 
MAE 
CBN 


THC CBN 
om2 52 ie O, 1a 7Oss2 
@Q 2752159 0. 3393965 
OQ IOSIS O CAOS 


CBD 
CBD 0.4154039 
HEC 0). 2352106) 
CBN 0.1470832 


jo) 


CBD THC CBN 


CBDA 1081258 
THC 0.05630523 
CBN 0.01847022 


0705630523 
0.06703743 
0.05462002 


0.01847022 
0.05462002 
0.10964122 


These estimates can be obtained using the function two. level .mv .WB 
introduced in Sect. 3.4.1.1 


> two.level.mv.WB (population, variables, 
+ grouping.object) 


where population is a data frame with the available data, 
variables indicates the columns where variables are displayed, and 
grouping.object indicates the item number. 


(continued) 
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Example 4.9 (continued) 
Given the available measurements, the Bayes factor can be calculated as in 
(4.13) using the function two. level .mvn.inv.BF. 


> BF=two.level.mvn.inv.BF(y,W1,W2,B1,B2,mul1,mu2) 
= isa’ 


[1] 48739.7 


The Bayes factor represents very strong support for the proposition according 
to which the seized plant is of drug type rather than fiber type. 


4.4.2.2 Non-normal Distribution for the Between-Source Variation 


As noted in Sect. 3.4.1.2, whenever the assumption of normality for the between- 
source variability is considered inappropriate, the normal distribution f (6); | 
Hi, Bi) = N(;, Bı) previously proposed can be replaced by a kernel density esti- 
mate as in (3.35). The marginal densities fp, (y) at the numerator and denominator 
of the Bayes factor become 


fiO | Wi, Bi, h) = (20)~? | By |7} (mh)? | Di 71 D7 +07 By! [HV 
mı 1 
x $ exp {-36 — Zu)! (Di + h? BD! G — z) , (414 


i=l 


where D; = n~! W4. Note that this is just the marginal density of the recovered data, 
that is, the first line in (3.38), with all multiplicative constants. 

The Bayes factor is then given by the ratio of the marginal probability densities 
in (4.14) for 7 = 1, 2, that is, 


Fe fiy | Wi, Bi, hi) 


= n 4.15 
H | Wa, Bz, hz) oP) 


Example 4.10 (Cannabis Seedlings—Continued) Consider again the case 
examined in Example 4.9, and suppose that a kernel distribution is 
used to model the between-source variability. First, the group means Zy; 
must be obtained. They can be obtained as an output of the function 
two.level.mv.WB that can be used to estimate the model parameters. 


(continued) 
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Example 4.10 (continued) 
> head (group.means.1) 


CBD MEE CBN 
IL cil 22249231 O,.2629209 0.771929 
% =O, 047348919 Io /O0VIS0 2,493962 
3 =-O,S90S6072 lols /ao7s 1 So0s290 
“=O A2VV7asoo9i iI, Satials 1 gI2527 
5 -0.54204482 1.2387804 1.545526 
6 -0.65989575 -0.9686288 1.831042 


> head (group.means. 2) 


CBD TEE CBN 
aA =) ZSAE =i 0232887 =0 8967/59 
isa 30 168927410 —0 093413 0 195759) 
143 -0.61568550 -1.0464456 -0.896759 
144 0.03267767 -0.9815586 -0.896759 
145 0, 1a2oaysol =O 0349306 =0) 96752) 
IAG =-0.51730995 -0,9909642 =O 896759 


> ml=dim(group.means.1) [1] 
> m2=dim(group.means.2) [1] 
> c(m1,m2) 


tl day 155 


Here we show only the first six rows of the (mı x p) matrices, where each 
row represents the vector of means Zy; = 1 Di Zij,! = 1, 2. Note that the 
group means Z; and Z2, as well as all the estimated parameters (M1, M2, W1, 
W2, Bı and B2) are available in the database plant .Rdata. 

The smoothing parameters h; and h2 in the two populations can be 


estimated as in (3.36), using the function hopt: 


> p=3 

> hi=hopt (p,m1) 
> h2=hopt (p,m2) 
= © (ail, a2) 


[1] 0.4675469 0.4491338 


Given the available measurements, the Bayes factor can be calculated as 
in (4.15) using the function two. level.mvk.inv.BF available in the 
supplementary materials available on the book’s website 


> BF=two.level.mvk.inv.BF(y,group.means.1, 
+ group.means.2,W1,W2,B1,B2,h1,h2) 


(continued) 
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Example 4.10 (continued) 
> BE 


[i] s42 


The Bayes factor represents moderate support for the proposition according to 
which the seized plant is drug type Cannabis rather than fiber type Cannabis. 


4.4.2.3 Assessing Model Performance 


One way to investigate the performance of the two models described in Sects. 4.4.2.1 
and 4.4.2.2, denoted here Model 1 and Model 2, is to calculate a Bayes factor for all 
available measurements on items from population 1 (drug type). One would expect 
to obtain BFs greater than 1 (see Table 4.1). Clearly, one should also consider BF 
computations for all measurements on items from population p2 (fiber type). In the 
latter case, BFs smaller than 1 would be expected (see Table 4.2). 


Table 4.1 Bayes factor l BF Model 1 Model 2 
values for items of =< 10"! 2 2 
population 1 (Example 4.9 i 
and 4.10) obtained using 10 -1 1 3 
(4.13) (Method 1) and (4.15) 1—10 2 7 
(Method 2) 10 — 107 0 7 
10? — 108 2 9 
10° — 10+ 3 8 
10* — 10° 5 2 
10° — 10° 4 3 
10° — 107 1 5 
107 — 108 6 3 
108 — 10° 1 4 
10° — 10!° 8 2 
10! — 19100 82 62 
Number of BFs > 1 114 112 
Number of BFs < 1 3 5 
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Table 4.2 Bayes factor BF Model 1 Model 2 
values for items of =< 10-10 20 85 
population 2 (Example 4.9 
: : 10710 — 19-9 0 
and 4.10) obtained using 
(4.13) (Method 1) and (4.15) 10-9 — 10-8 8 0 
(Method 2) 1078 — 1077 10 0 
10-7 — 10-6 14 0 
1076 — 10-5 29 0 
1075 — 10-4 19 0 
1074 — 10-3 20 0 
10-3 — 107? 16 35 
107? — 107! 2 23 
107! - 1 1 7 
1—10 6 3 
10 — 10° 0 1 
10? — 103 0 1 
> 104 6 0 
Number of BFs > 1 12 5 
Number of BFs < 1 143 150 


4.5 Summary of R Functions 


The R functions outlined below have been used in this chapter. 


Functions Available in the Base Package 

apply: Applies a function to the margins (either rows or columns) of a matrix. 

colMeans: Forms column means for numeric arrays (or data frames). 

d<name of distribution> (e.g., dnorm): Calculates the density for many 
parametric distributions. 

More details can be found in the Help menu, help.start(). 


Functions Available in Other Packages 

dbbinom and ddirmnom in the package extraDistr: Calculate the density 
of a beta-binomial distribution and that of a Dirichlet-multinomial distribution, 
respectively. 

dstp and dmvt in the package LaplacesDemon: Calculate the density of a 
non-central Student t distribution and of a non-central multivariate Student t 
distribution, respectively. 

fitdist and fitDirichlet in the package SHELF: Fit a parametric distri- 
bution starting from elicited probabilities and a Dirichlet distribution from the 
elicited beta distributions for a set of proportions, respectively. 
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Functions Developed in the Chapter 

beta_prior: Calculates the hyperparameters a and 6 of a beta distribution 
Be(q, ) starting from the prior mean m and the prior variance v. 

Usage: beta_prior(m,v). 

Arguments: m, the prior mean; v, the prior variance. 

Output: A vector of values, the first is œ, the second is £. 


hopt: Calculates the estimates h of the smoothing parameter h. 
Usage: hopt (p,m). 

Arguments: p, the number of variables; m, the number of sources. 
Output: A scalar value. 


kn1: Computes the kernel density estimation (numerator). 

Usage: kn1 (x,pop1,sk1). 

Arguments: x, a vector of available measurements; pop1, a vector of measurements 
of drug intensities on banknotes from drug trafficking where the kernel is 
centered; ski, the variance hist of the kernel, where hı is the smoothing 
parameter and s? is the sample variance of the available measurements. 

Output: A scalar value. 


post_distr: Computes the posterior distribution N(ux, TŻ) of a normal mean 9, 
with X ~ N(@, o°) and 0 ~ N(, T°). 

Usage: post_distr(sigma,n,barx,pm,pv). 

Arguments: sigma, the variance a? of the observations; n, the number of observa- 
tions; barx, the sample mean x of the observations. pm, the mean u of the prior 
distribution N(u, 12); pv, the variance t? of the prior distribution N(u, T’). 

Output: A vector of two values, the first is the posterior mean ux, the second is the 


posterior variance TŻ. 


two.level.mv.WB: Computes the estimate of the overall mean pm, the group 
means Z;, the within-group covariance matrix W, and the between-group covari- 
ance matrix B. 

Usage: two.level.mv.WB(population, variables, grouping 
. variable) 

Arguments: population, a data frame with N rows and k columns collecting 
measurements on m sources with n; items for each source, i = 1,...,m; 
variables, a vector containing the column indices of the variables to be used; 
grouping.variable, a scalar specifying the variable that is to be used as 
the grouping factor. 

Output: The group means Z;, the estimated overall mean jt, the estimated within- 
group covariance matrix W, the estimated between-group covariance matrix B. 


two.level.mvn.inv.BF: Computes the BF for investigative purposes from a 
two-level model where both the within-source variability and the between-source 
variability are normally distributed. 

Usage: two.level.mvn.inv.BF(y,W1,W2,B1,B2,mul,mu2, vari 
ables). 
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Arguments: y, a (n xX p) matrix of measurements; W; and W2, the within-source 
covariance matrices; Bı and B2, the between-source covariance matrices; the 
overall group means jz; and u2; variables, a vector containing the column 
indices of the variables to be used. 

Output: A scalar value. 


two.level.mvk.inv.BF: Computes the BF for investigative purposes from a 
two-level model where the within-source variability is assumed to be normally 
distributed, while the between-source variability is modeled by a kernel density. 

Usage: two. level .mvk.inv.BF(y,gmul,gmu2,W1,W2,B1,B2,h1,h2). 

Arguments: y, a (n x p) matrix of measurements; gmu1 and gmu2, the group means 
Zi; and Z2;; Wy and W2, the within-source covariance matrices; B; and Bo, the 
between-source covariance matrices; hı and ho, the smoothing parameters hı 
and h2. 

Output: A scalar value. 
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Glass, 112, 117, 132 

Gunshot residues, 144, 145, 151 


H 
Hamiltonian Monte Carlo, 31 
Handwriting, 20, 120, 124 
Hyperparameter, 35 
Hypothesis, 3 

alternative, 8 

composite, 8 

null, 8 

simple, 8 


I 
Image analysis, 72 
Image comparison, 19 
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Markov chain Monte Carlo, 29 
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Gibbs sampling algorithm, 30, 120 
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Metropolis—Hastings algorithm, 30 hierarchy of, 13 
Metropolis-Hastings algorithm multiple, 25, 103 

two-block, 55 
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Maximum likelihood, 85 


specific-source, 21 
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Metropolis—Hastings algorithm, 30 98, 100, 102, 103 
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